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Query: 63 D NNSKIA 69 

D N+S IA 
Sbjct: 62 DTTENDSLIA 71 

5 A related DNA sequence was identified in S.pyogenes <SEQ ID 581> which encodes the amino acid 
sequence <SEQ ID 582>. Analysis of this protein sequence reveals the following: 

Possible site: 30 

»> Seems to have no N-terminal signal sequence 

10 Final Results 

bacterial cytoplasm Certainty=0 . 0680 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 An alignment of the GAS and GBS proteins is shown below: 

Identities = 21/61 (34%) , Positives = 34/61 (55%) 

Query: 1 MYNRLKELRKDKGLTQADLAKVINTNQSQYGKYENGKTSLS I ENSKI LADFFGVS I PYLL 60 
MY R++ LR+D TQ +A +++ + + Y K E G+ +L + + VSI YLL 

20 Sbjct: 1 MYPRIRNLREDNDFTQKFVANLLSFSHANYAKIERGEVALMADVLVQFYKLYNVSIDYLL 60 

Query: 61 G 61 
G 

Sbjct: 61 G 61 

25 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 174 

A DNA sequence (GBSx0180) was identified in S.agalactiae <SEQ ID 583> which encodes the amino acid 
30 sequence <SEQ ID 584>. Analysis of this protein sequence reveals the following: 

Possible site: 29 

>>> Seems to have no N-terminal signal sequence 

Final Results 

35 bacterial cytoplasm Certainty=0. 5278 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

40 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 175 

A DNA sequence (GBSx0181) was identified in S.agalactiae <SEQ ID 585> which encodes the amino acid 
45 sequence <SEQ ID 586>. Analysis of this protein sequence reveals the following: 

Possible site: 60 

»> Seems to have no N-terminal signal sequence 

Final Results 

50 bacterial cytoplasm Certainty=0. 3762 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 
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The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

5 Example 176 

A DNA sequence (GBSx0182) was identified in S.agalactiae <SEQ ID 587> which encodes the amino acid 
sequence <SEQ ID 588>. Analysis of this protein sequence reveals the following: 

Possible site: 59 

>» Seems to have no N-terminal signal sequence 
10 INTEGRAL Likelihood = -9.66 Transmembrane 40 - 56 ( 33 - 65) 

INTEGRAL Likelihood = -5.79 Transmembrane 62 - 78 ( 59 - 81) 

Final Results 

bacterial membrane Certainty=0 .4864 (Affirmative) < suco 

15 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



45 



50 



No corresponding DNA sequence was identified in S.pyogenes. 



A related GBS gene <SEQ ID 8505> and protein <SEQ ID 8506> were also identified. Analysis of this 
20 protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 7 
McG: Discrim Score: -16.96 
GvH: Signal Score (-7.5): -2.95 
Possible site: 57 
25 >>> Seems to have no N-terminal signal sequence 

ALOM program count: 2 value: -9.66 threshold: 0.0 

INTEGRAL Likelihood = -9.66 Transmembrane 33 - 49 ( 26 - 58) 
INTEGRAL Likelihood = -5.79 Transmembrane 55 - 71 ( 52 - 74) 
PERIPHERAL Likelihood = 10.87 14 
30 modified ALOM score: 2.43 

*** Reasoning Step: 3 

Final Results 

35 bacterial membrane Certainty=0. 4 8 64 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

40 Example 177 

A DNA sequence (GBSx0183) was identified in S.agalactiae <SEQ ID 589> which encodes the amino acid 
sequence <SEQ ID 590>. Analysis of this protein sequence reveals the following: 



Possible site: 31 

»> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 3276 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
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No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 178 

5 A DNA sequence (GBSx0184) was identified in S.agalactiae <SEQ ID 591> which encodes the amino acid 
sequence <SEQ ID 592>. Analysis of this protein sequence reveals the following: 

Possible site: 44 

»> Seems to have no N-terminal signal sequence 

10 Final Results 

bacterial cytoplasm Certainty=0 . 3482 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 A related GBS nucleic acid sequence <SEQ ID 9509> which encodes amino acid sequence <SEQ ID 9510> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAA30291 GB:X07371 RepM protein (AA 1 - 314) [Staphylococcus 
aureus] 

20 Identities = 89/283 (31%) , Positives = 145/283 (50%) , Gaps = 26/283 (9%) 

Query: 67 KVSLDNITMTAYIKSKKYLAMKQLIETHLAIWQTAMroMFRATTGDGIHVATLHMNYDKQ 126 

K+S D +T+ + + + I + + F+A + +++ YDK 

SbjCt: 42 KLSFDAMTIVGNLNKNSAKKLSDFMSLDPQIRLWDILQTKFKAKA LQEKVYIEYDKV 98 

25 

Query: 127 KGQDRKARPFRLEFNPNKLRLVDSEII DTIIPFLEDISISRADLAFDLFEVDCSEF- 182 

K R R+EFNPNKL E++ II ++ED +R DIAFD FE D S++ 

Sbjct: 99 KADTWDRRNMR VEFNPNKL - - THDEMLWLKHNI IDYMEDDGFTRLDLAFD - FEDDLSDYY 155 

30 Query: 183 -VLEKKGRPTATKEFRSSTGTLETKYLGAPRSEKQVRLYNKKKEQLQNGTDKDKDFASQF 241 
+ EK + T F +TG ETKY G+ S + +R+YNKKKE+ +N D D +++ 
Sbjct: 156 ALSEKALKRTV FFGTTGKAETKYFGSRDSNRFIRIYNKKKERKENA DVDVSAE- 208 

Query: 242 KHWWRLEFQLRSRSIDEIFEVI -DTI IFKP- - FNLKGLSIETQIYLTALIHDKNIWKKLH 298 
35 H WR+E +L+ +D D I KP L+ L + +YL L+H+++ W +LH 

Sbjct: 209 -HLWRVEIELKRD^WDYWNNCFNDIlHILKPAWATLESLKEQA^lVYL--LLHEESKWGELH 265 

Query: 299 RNTRARYKKILETHQTSDTDYLGLLKDLLKHERPRLENQLAYY 341 
. RN+R +YK+I++ + S D L+K L L+ Q+ ++ 

40 Sbjct: 266 RNSRRKYKQIIQ--EISSIDLTDLMKSTLTDNEENLQKQINFW 306 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

45 Example 179 

A DNA sequence (GBSx0185) was identified in S.agalactiae <SEQ ID 593> which encodes the amino acid 
sequence <SEQ ID 594>. Analysis of this protein sequence reveals the following: 

Possible site: 32 

>» Seems to have no N-terminal signal sequence 
50 INTEGRAL Likelihood =-15.55 Transmembrane 137 - 153 ( 133 - 157) 



Final Results 

bacterial membrane Certainty=0. 7220 (Affirmative) < suco 
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bacterial outside --- Certainty* 0 . 0000 (Not Clear) < suco 
bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 951 1> which encodes amino acid sequence <SEQ ID 9512> 
5 was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

A related GBS gene <SEQ ID 8507> and protein <SEQ ID 8508> were also identified. Analysis of this 
protein sequence reveals the following: 

10 Lipop: Possible site: -1 Crend: 2 

McG: Discrim Score: -16.84 
GvH: Signal Score (-7.5): -5.3 

Possible site: 32 
»> Seems to have no N-terminal signal sequence 
15 ALOM program count: 1 value: -15.55 threshold: 0.0 

INTEGRAL Likelihood =-15.55 Transmembrane 137 - 153 ( 133 - 157) 
PERIPHERAL Likelihood = 10.93 60 
modified ALOM score: 3.61 

20 *** Reasoning Step: 3 

1 " bacterial membrane --- Certainty=0 . 7220 (Affirmative) < suco 
bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 
25 bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

ORF01844(292 - 702 of 1074) . 
EGAD 1 124517 1 132830 (149 - 295 of 435) apolipoprotein A- IV {Mus musculus) 
30 GP|l91889|gb|AAA37216.l| |M64250 apolipoprotein A- IV {Mus musculus castaneus} 

%Match =4.6 

%Identity =30.0 %Similarity =54.6 

Matches = 39 Mismatches = 53 Conservative Sub.s = 32 

35 201 231 261 291 321 351 381 411 

NSSNIRY*LFRFAERLVEA*KTKTRKSARLLWG*DRQK*LSTLLLKIQYYQGVTRSKMRIKDYADSLGVSSQSIYKRIRS 

| :|:| = I = 11= 1=1 I = :: 

LRDRMMPHANKVTQTFGENMQKLQEHLKPYAVDLQDQ 

120 130 140 150 160 170 180 

40 

435 462 492 522 552 570 

P--KYKERLKGHLY-RDNQKVENLDLIGIKILEDYHFENDVIELEKTLGD IQEEFEQEKKGMQY 

. I ||[|| 1 |: :| =: :| =:| :||:: :: :|: : 

KFNRNMEELKGHLTPRANELKATID QNLEDLRRSLAPLTVGVQEKLNHQMEGLAFQMKKNAEELQTK 

45 . 200 210 220 230 240 250 

615 645 672 702 732 762 792 822 

- - -RIDRLADKLTPLLEDNQNLVQKNYE-LLNYWSLER 

50 vsakidqlqknLplvedvqskvkgntLlqkslkd 

270 280 290 300 310 320 330 

SEQ ID 8508 (GBS405) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 171 (lane 4; MW 46kDa - 2 bands) and in Figure 177 (lane 7; MW 46kDa). It 
55 was also expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell extract is shown in 
Figure 76 (lane 5; MW 21kDa). 

GBS405-GST was purified as shown in Figure 218, lane 8. 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 180 

A DNA sequence (GBSx0186) was identified in S.agalactiae <SEQ ID 595> which encodes the amino acid 
sequence <SEQ ID 596>. Analysis of this protein sequence reveals the following: 

Possible site: 18 

>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3406 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAA33713 GB:X15669 pre protein (AA 1-494) [Streptococcus 
agalactiae] 

Identities = 171/402 (42%) , Positives = 250/402 (61%) , Gaps = 46/402 (11%) 

Query: 1 MSYWAR^KYKSGQLTAIYNHNERIFKNHSNKEIDVEKSHLNYELTNRDQAQNYHKQIK 60 

MSY+VARM K K+G L + HNER+F+ HSNK+I+ +SHLNYELT+RD++ +Y KQIK 
Sbjct: 1 MSYMVARMQKMKAGNLGGAFKHNERVFETHSNKD INPSRSHLNYELTDRDRSVSYEKQI K 60 

Query: 61 EHINENRLSTRGVRKDAIIiCNEWIITSDKTFFDSLDEKQTREFFETAKDYFAEKYGDANI 120 

+++NEN++S R +RKDA+LC+EWIITSDK FF+ LDE+QTR FFETAK+YFAE YG++NI 
Sbjct: 61 DYVNENKVSNRAIRKDAVLCDEWIITSDKDFFEKLDEEOTRTFFETAKNYFAENYGESNI 120 

Query: 121 AYARVHLDESTPHMHLGIVPMKNGKLSSKALFGNKEKLVAIQDELPKYLNEHGFNLQRGE 180 

AYA VHLDESTPHMH+G+VP +NGKLSSKA+F ++E+L IQ++LP+Y+++HGF L+RG+ 
Sbjct: 121 AYASVHLDESTPHMHMGWPFENGKLSSKftMF-DREELKHIQEDLPRYMSDHGFELERGK 179 

Query: 181 IGSKKKHLETAEFKEKQRLLDNADRKLADKHEELKALDDKISNV-NDTIA 229 

+ S+ KH AEFK ++ +L +K+ +D++ + NDT A 

Sbjct: 180 LNSEAKHKTVAEFKRAMADME-LKEELLEKYHAPPFVDERTGELNNDTEAFWHEKEFADM 238 

Query: 230 -DKESRLKEL- - -EAKEWDAVGDLKQYELEKQSLAESIEDIKDIELLQLDRIQKEDLVKQ 285 

+ +S ++E E +W KQY+ E + L S + ++D D E+L+ + 

Sbjct: 239 FEVQS P I RETTNQEKMDWLR KQYQEELKKLESSKKPLED DLSHLEELLDK 288 

Query: 286 SFDGKLKMDKETYNRLFQTASKHASSNAELKRDLVKAQSQNNHLSREDLNHRKTAEKNIK 345 

+K+D E AS+ AS +L KA+ N L NH K+ E 1+ 

Sbjct: 289 KTKEYIKIDSE ASERAS ELSKAEGYINTLE NHSKSLEAKIE 329 

Query: 346 LSQENRKLKDKVKMLDEQVKILNKSLSVWKEKAKEFMPKQVY 387 

+ + +K K + K LN+S + K F+ K+ Y 

Sbjct: 330 CLESDNLQLEKQKATKLEAKAIoNESELRELKPKKNFLGKEHY 371 

A related DNA sequence was identified in S.pyogenes <SEQ ID 597> which encodes the amino acid 
sequence <SEQ ID 598>. Analysis of this protein sequence reveals the following: 

LPXTG motif: 2025-2030 

Possible site: 52 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-10.08 Transmembrane 2034 -2050 (2030 -2053) 
INTEGRAL Likelihood = -6.05 Transmembrane 21 - 37 ( 20 - 39) 

Final Results 

bacterial membrane Certainty=0 .5034 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the databases: 

>GP:AAD03320 GB:AF067776 extracellular matrix binding protein 
[Abiotrophia defectiva] 
5 Identities = 362/1396 (25%) , Positives = 591/1396 (41%) , Gaps = 87/1396 (6%) 

Query: 636 KAEVKLKEAHEATKQAIEKDPWLSPEQKKAQKEKAKARLDEGLKALKAADSLEILKVTEE 695 

+A+ + A +A AI+ + L+ E+K A+K K+A + L + A KT 
Sbjct: 636 EAKNAVNNAAKAKOTAIDNMMLTAEEKAAEKAKVEAAKNATLAGIDQA KTTAA 689 

10 

Query: 696 AFVDKEKNPDSIPNQHKAGTADQARKQALDSLDKEVQKELESIDNDNTLTTDEKAAAKICK 755 

+ K I + A A AL+ + ++ I LT +EK A + 

Sbjct: 690 RNAAQNKGTTDINAVNPVPVAKPAANAALE QAAVNKINEISQRPDLTREEKQAFMDQ 746 

15 Query: 756 VNDAYDVAKQTAMEANSYEDLTTI KDEFLS NLPHKQGTPLKDQQSDAIAELEKKQQE 812 

V ADA A + + +T+ +D+ L+ NLP TP + +A+ + + 
Sbjct: 747 WTARDAAMAKVASAANNQAVTSARDQGLNAVNNLP TPAA-KYPEALGHVRQAADA 801 

Query: 813 IEKAIEGDKTLPRDEKEKQIADSKERLKSDTQKOTCDAKNADAIKKAFEEGKVNIPQAHIP 872 
20 +AI + L +E+ + + + + KA +G I 

Sbjct: 802 KRQAIRDNANLTAEEQADALRQVDAAQTAAEAAINQNHTNATLAKADSDGVKAI 855 

Query: 873 GDLN KDKEKLLAELKQKADDTEKAIDVDKTLTEDEKKEQKVKTKAELEKAKTDVKNT 929 

D+N + K L+Q A +AI+ + LT++EK + + L AKT V+ 

25 Sbjct: 856 ffl)INPQPRSKPAANQALEQVAAAKRQAINNNNQLTDEEKAQAIQQVDQALANAKTQVQAA 915 

Query: 930 QTREELDKKVPELKKAIEDTHVKGNLEGVKNKAIEDLKKAHTETVAKINGDDTLDKATKE 989 

+++ AI + + +G K +AI ++ A ++ G + L + 

Sbjct: 916 NDNNGVNQAKTAGTTAINNINPQGTQ KAQAIAAIEAAEQAKRLELQGRNDLTTEERN 972 

30 

Query: 990 AQVKEADKAliAAGKDAITKADDADKVSTAVTEHTPKIKAAHKTGDLKKAQVDAOTALD^ 1049 

+ + A KDA+ +A + V+ A +1+ + T +K DA A+D+A 
Sbjct: 973 NALADLTAKAQAAKDAVNQARNOTGVAGAKDNGVAQIQGINPTAWKP DARNAIDQA 1029 

35 Query: 1050 AEKERGEINKDATLTTEDKAKQLKEVETALTKAKDNVKAAKTADAINDARDKGVATIDAV 1109 

A + E + LT E+KA +K+V+ A AK + A + +N+A ++G A I A+ 
Sbjct: 1030 ARDKEAEFQANTKLTDEEKAAAIKKVQDAARDAKAAIDRAGSNGDVNNAVNQGKAAIQAI 1089 

Query: 1110 HKAGQDLGARKSGQVAKLEEAAKATKDKISADPTLTSKEKEEQSKAVDAELKKAIEAVNA 1169 
40 + K A ++ AA A K I+A+ LT +EK K V+ E KA AV+A 

Sbjct: 1090 KALDDSQPSAKDTAKAAIQNAADAKKAAITANNALTQEEKAAAIKQVEDEAAKAQAAVDA 1149 

Query: 1170 ADTADKVDDALGEGVTDIKNQHKSGDSIDARREAHGKELDRVAQETKGAIEKDPTLTTEE 1229 
+ + VD A +G+ I + ++ + +D+ A+K I D TLT EE 

45 Sbjct: 1150 SRSKADVDRAKDQGLQKISDV PAVQPPKLNAIAAVDQAATDKKAVINNDTTLTQEE 1205 

Query: 1230 KAKQVKDVDAAKERGMAKLNEAKDADALDKAYGEGOTDIKNQHKSGDPVDARRGL^ 1289 

K ++ VD + +N+A + +G IN ++ A + ++ 

Sbjct: 1206 KEAAIRKVDEEAAKARQAINDATSNADVAAKQAQGTQAINNVPQT PAAKNAAKAAV 1261 

50 

Query: 1290 DEVAQATKDAITADTTLTEAEKETQRGNVDKEATKAKEELAKAKDADALDKAYGDGVTSI 1349 

++ A A K AI D LT EK+ VD+E KA++ + A + +G +1 

Sbjct: 1262 EQAADAKKQAIENDPNLTRQEKDAAIAKVDQETNKARQAIDAATTNADVTAKQNEGTQAI 1321 

55 Query: 1350 KNQHKSGKGLDVRKDEHKKALEAVAKRVTAE I EADPTLTPEVREQQKAEVQKELELATDK 1409 

++ K K + K A+ A+ + IE DP LT E ++ KA+V E A + 

Sbjct: 1322 NAVPQTPKA KTDAKMAOTQAAEDKKSAIENDPNLTREEKDAAKAKVDAEATKAKNA 1377 

Query: 1410 IAEAKDADEADKAYGDGOTAIENAHVIGKGIEARKDLAKKDLAEAAAKTKALI IEDKTLT 1469 
60 I A D+ +G AI + + + +A+ D AK + +AA + K I D LT 

Sbjct: 1378 IDAATSNDDETAKQNEGTQAI NAVPQTPKAKTD-AKNAVTQAADRKKDAIENDPNLT 1433 

Query: 1470 DDQRKEQLLGVDTEYAKGIENIDAAKDAAGVDKAYSDGVRDILAQYKEGQNIM)RRNAAK 1529 
+++ VD E K + IDAA A V ++G +1 + + AK 

65 Sbjct: 1434 REEKVAAKAKVDAEAKKAKDAIDAATSNADVTAKQNEGTKR.I NDVPQTPTAKTDAK 1489 



Query: 1530 EFLLKEADKVTKLINDDPTLTHDQKVDQINKVEQAKLDAIKSVDDAQTADAINDALGKGI 1589 
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+ + AD I DP LT ++K KV+ A ++D A + + +G 

Sbjct: 1490 NAOTQAADAKKDAIEKDP^TREEKDAAKAKVDAEAKKAKDAIDAATSNADVTAKQNEGT 1549 

Query: 1590 ENINNQYQHGDGVDTOKATAKGDLEKEAAKVKALIAKDPTLTQADKDKQTAAVDAAKNTA 1649 
5 + IN+ Q K AK + + A K I KDP LT+ +KD A VDA A 

Sbjct: 1550 KAINDVPQ TPTAKTDAKNAVTQAftDAKKDAIEKDPNLTREEKDAAKAKVDAEAKKA 1605 

Query: 1650 IAAVDKATTTEGINQELGKGITA1NKAYRPGEGVKARKEAAKADLEKEAAKVKALITNDP 1709 
A+D AT+ + + G AIN + KAK++A KIND 

10 Sbjct: 1606 KDAIDAATSNADVTAQKDAGKNAINAVPQ TPTAKTDAKNAVTQAADAKKDAIENDA 1661 

Query: 1710 TLTKADK-AKQTEAVAKALKAAIAAVDKATTAEGINQELGKGITAINKAYRPGEGVKARK 1768 

LT+ +K A + + A+A KA A+D AT+ + + +G AIN + K 
Sbjct: 1662 NLTREEKDAAKAKVDAEATKAK-NAIDAATSNADVTAKQNEGTKAINDVPQ TPTAK 1716 

15 

Query: 1769 EAAKADLEREAAKVREAIANDPTLTKADK-AKQTEAVAKALKAAIAAVDKATTAEGINQE 1827 

AK +++ A + AI NDP LT+ +K A + + A+A KA A+D AT+ + + 
Sbjct: 1717 TDAKNAVDQAATDKKSAIENDPALTREEKDAAKAKVDAEATKAK-NAIDAATSNADVTAQ 1775 

20 Query: 1828 LGKGITAINKAYRPGEGVEAHKEAAKANLEKVAKETKALISGDRYLSETEKAVQKQAVEQ 1887 

G AIN + K AK +++ A + KA I D L+ EK K V+ 

Sbjct: 1776 KDAGKNAINAVPQ TPTAKTDAKNAVDQAATDKKAAIENDPALTREEKDAAKAKVDA 1831 

Query: 1888 Al^KALGQVEAAKTVEAVKLAENLGWAIRSAWAGLAKDTDQATAAIiNEAKQAAIEALK 1947 
25 KA ++AA +V++G KDA AKAA+ 

Sbjct: 1832 EAKKAKDAIDAATSNADVTAQKDAG KDAINAVPQTPTAKTDAKNAVD 1878 

Query: 1948 QAAAETLAKITTDAKLTEAQKAEQSENVSLALKTAIATVRSAQSIASVKEAKDKGITAIR 2007 
QAA + + I D LT +K V K A + +A S A V + +G AI 

30 Sbjct: 1879 QAATDKKSAIENDPALTREEKDAVKAKVDAEAKKAKDAIDAATSNADVTAKQTEGTQAIN 1938 

Query: 2008 AAYVPNKAVAKS S SAN 2023 

A VP AK+ + N 
Sbjct: 1939 A- -VPQTPTAKTDAKN 1952 

35 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 77/396 (19%) , Positives = 157/396 (39%) , Gaps = 48/396 (12%) 

Query: 42 LNYELTNRDQAQNYHKQIKEHINENRLSTRGVRKDAILCNEWIITSDKTFFDSLDEKQTR 101 
40 L++E+ + ++QN K+I + + D E +1 K +++ EK T 

Sbjct: 338 LDFEILH- PRSQNVSKKISKQVEAKPF DPASYKEKVIAKLKPVYEATSEKITN 389 

Query: 102 EFF- -ETAKDYFAEKYGDANIAYARVHLDESTPHMHLGIVPMKNGKLSSKALFG- -NKEK 157 
+ + E AKD +K + 1+ G V + +A+ NK 

45 Sbjct: 390 DAWLDENAKDLQKQKLEEQYIS GKVAI SEAGTKQEAI DAAYNKYS 434 

Query: 158 LVAIQDELPKYLNEHGFNLQRGEIGSKKKHLETAEFKEKQRLLDN ADRKLADKHEEL 214 

D LP + N + + ++ ++T + K D K K E L 

Sbjct: 435 SQTDPDSLPSQYKQG--NKENEQEKGRQDLIQTRDLTLKAIQEDKWLTEQEKTIQKEEAL 492 

50 

Query: 215 KALDDKISNVNDTIADKESRLKELEAKEWDAVGDLKQYE LEKQSLAESIE 264 

KA + I +VN T++ ++ + + + K + + K+Y EK+ A E 

Sbjct: 493 KAFETGIESWQTVSLEQLKQRLIVYKASEKDSEKKEYPESIPNQHIPGKEKEVKAAKQE 552 

55 Query: 265 DIKDIELLQLDRIQKEDLWQSFDGKLKMDKETYNRLFQTASKHASSNAELKRDLVKAQS 324 

++K + L++I +++ + E + QAKA+ +L+ DL S 

Sbjct: 553 ELKKLHDTTLEKINQDKWLTPDQQAEQLKQAEVTFKKGQEAIKSAQTLTQLETDLADYVS 612 

Query: 325 QNNHLSRELIiNHRKTAEKNIKLSQENRKLKDKVKMLDEQVK ILNKSLSVWKEKAKE 380 

60 +N + + K+ K+ +++ KLK+ + + ++ + + KEKAK 

Sbjct: 613 EINEGKGNSIPDKYKSGNKDDLWKAEVKLKEAHEATKQAIEKDPWLSPEQKKAQKEKAKA 672 

Query: 381 FMPKQVYRETLSIINTLNPIGLAKTAIRQVKKMVDS 416 
+ + + + L ++L + + + A +K DS 
65 Sbjct: 673 RLDEGL- -KALKAADSLEILKVTEEAFVDKEKNPDS 706 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 181 

A DNA sequence (GBSx0187) was identified in S.agalactiae <SEQ ID 599> which encodes the amino acid 
5 sequence <SEQ ID 600>. Analysis of this protein sequence reveals the following: 

Possible site: 42 

>» Seems to have no N-terminal signal sequence 

Final Results 

10 bacterial cytoplasm Certainty=0 . 2544 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

15 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 182 

A DNA sequence (GBSx0188) was identified in S.agalactiae <SEQ ID 601> which encodes the amino acid 
20 sequence <SEQ ID 602>. Analysis of this protein sequence reveals the following: 

Possible site: 57 

>>> Seems to have no N-terminal signal sequence 

Final Results 

25 bacterial cytoplasm Certainty=0 . 2045 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

30 A related DNA sequence was identified in S.pyogenes <SEQ ID 603> which encodes the amino acid 
sequence <SEQ ID 604>. Analysis of this protein sequence reveals the following: 

Possible site: 57 

>>> Seems to have no N-terminal signal sequence 

35 Final Results 

bacterial cytoplasm Certainty=0 . 2045 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

40 An alignment of the GAS and GBS proteins is shown below: 

Identities = 102/111 (91%) , Positives = 107/111 (95%) 

Query: 1 MDYKKYQIIYAPDVLEKLKEIRDYISQNYSSTSGQHKMEQIISDIEKLEVFPEVGFDADE 60 
+DYKKYQIIYAPDVLEKLKEIRDYISQNYSSTSGQ KMEQIISDIEKLEVFPEVGFDADE 
45 Sbjct: 1 LDYKKYQIIYAPDVLEKLKEIRDYISQNYSSTSGQRKMEQIISDIEKLEVFPEVGFDADE 60 

Query: 61 KYGSKISKYHSTRGYTLSKDYIVLYHIEEEENRWIDYLLPTRSDYMKLFK 111 

KYGSKI YHST+GYTLSKDYIVLYHIE EENR+VIDYLLPT+SDY+KLFK 
Sbjct: 61 KYGSKIIHYHSTKGYTLSKDYIVLYHIEGEENRIVIDYLLPTQSDYIKLFK 111 

50 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 183 

A DNA sequence (GBSx0189) was identified in S.agalactiae <SEQ ID 605> which encodes the amino acid 
5 sequence <SEQ ID 606>. Analysis of this protein sequence reveals the following: 

Possible site: 13 

>» Seems to have no N- terminal signal sequence 

Final Results 

10 bacterial cytoplasm Certainty=0 . 1621 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

15 A related DNA sequence was identified in S.pyogenes <SEQ ID 607> which encodes the amino acid 
sequence <SEQ ID 608>. Analysis of this protein sequence reveals the following: 

Possible site: 22 

»> Seems to have no N-terminal signal sequence 

20 Final Results 

bacterial cytoplasm Certainty=0 . 1596 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

25 An alignment of the GAS and GBS proteins is shown below: 

Identities = 91/95 (95%) , Positives = 93/95 (97%) 

Query: 1 MVTAEKNRAVTFQANKELVSFAMTVLNKKNLTLSSALRLFLQNVVVTNE 60 
M T +KNRAVTFQANKELVSEAMTVI1NKKNLTLSSALRLFLQ 
30 Sbjct: 1 MTTVKi<NRAVTFQANKELVSFJ^TVI^K^ 60 

Query: 61 EKLFKQFQAEINKNIEDVRQGKFYTSEEVRSELGL 95 

EKLFKQFQAEINKNIEDVRQGKFYTSEEVR+ELGL 
Sbjct: 61 EKLFKQFQAEINKNIEDVRQGKFYTSEEVRAELGL 95 

35 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 184 

A DNA sequence (GBSx0190) was identified in S.agalactiae <SEQ ID 609> which encodes the amino acid 
40 sequence <SEQ ID 610>. Analysis of this protein sequence reveals the following: 

Possible site: 56 

>» Seems to have no N-terminal signal sequence 

Final Results 

45 bacterial cytoplasm Certainty=0 .4568 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9513> which encodes amino acid sequence <SEQ ID 9514> 
50 was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 
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>GP:CAA46375 GB:X65276 ORFA1 [Clostridium acetobutylicum] 
Identities = 36/91 (39%) , Positives = 51/91 (55%) 

Query: 2 MSQIKLTPEELRISAQKYTTGSQSITDVLTVLTQEQAVIDENWDGTAFDSFEAQFNELSP 61 

M+QI +TPEEL+ AQY +1 ++ +IEWGAF++ Q+N+L 
Sbjct: 1 ^QISOTPEELKSQAQVYIQSKEEIDQAIQKVNSMNSTIAEEWKGQAFQAYLEQYNQLHQ 60 

Query: 62 KITQFAQLLEDINQQLLKVADWEQTDSDIA 92 

+ QF LLE +NQQL K AD V + D+ A 
Sbjct: 61 TWQFENLLESVNQQLNKYADTVAERDAQDA 91 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 185 

A DNA sequence (GBSx0191) was identified in S.agalactiae <SEQ ID 61 1> which encodes the amino acid 
sequence <SEQ ID 612>. Analysis of this protein sequence reveals the following: 

Possible site: 21 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 4523 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 186 

A DNA sequence (GBSx0192) was identified in S.agalactiae <SEQ ID 613> which encodes the amino acid 
sequence <SEQ ID 614>. Analysis of this protein sequence reveals the following: 

Possible site: 44 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 5339 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 187 

A DNA sequence (GBSx0193) was identified in S.agalactiae <SEQ ID 615> which encodes the amino acid 
sequence <SEQ ID 61 6>. This protein is predicted to be chromosome assembly protein. Analysis of this 
protein sequence reveals the following: 

5 Possible site: 61 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4620 (Affirmative) < suco 

10 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

15 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 188 

A DNA sequence (GBSx0194) was identified in S.agalactiae <SEQ ID 617> which encodes the amino acid 
sequence <SEQ ID 618>. Analysis of this protein sequence reveals the following: 

20 Possible site: 46 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 4511 (Affirmative) < suco 

25 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

30 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 189 

A DNA sequence (GBSx0195) was identified in S.agalactiae <SEQ ID 619> which encodes the amino acid 
sequence <SEQ ID 620>. Analysis of this protein sequence reveals the following: 

35 Possible site: 20 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 5249 (Affirmative) < suco 

40 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certaintyi=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

45 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 190 

A DNA sequence (GBSx0196) was identified in S.agalactiae <SEQ ID 621 > which encodes the amino acid 
sequence <SEQ ID 622>. Analysis of this protein sequence reveals the following: 

Possible site: 14 
5 >>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3 542 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9515> which encodes amino acid sequence <SEQ ID 9516> 
was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 

15 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 191 

A DNA sequence (GBSx0197) was identified in S.agalactiae <SEQ ID 623> which encodes the amino acid 
20 sequence <SEQ ID 624>. Analysis of this protein sequence reveals the following: 

Possible site: 15 

»> Seems to have no N-terminal signal sequence 

Pinal Results 

25 bacterial cytoplasm Certainty=0 . 3098 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

30 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 192 

A DNA sequence (GBSx0198) was identified in S.agalactiae <SEQ ID 625> which encodes the amino acid 
35 sequence <SEQ ID 626>. This protein is predicted to be rgg protein. Analysis of this protein sequence 
reveals the following: 

Possible site: 59 

>» Seems to have no N-terminal signal sequence 

40 Final Results 

bacterial cytoplasm Certainty=0 .3177 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

45 The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAA26968 GB:M89776 rgg [Streptococcus gordonii] 
Identities = 74/277 (26%) , Positives = 142/277 (50%) 
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Query: 7 IFREFRLNRQFSLKQVASNELSVSQLSRFERGESDLSLTKFLGALKAIDLSISEFMDRVN 66 

I + R ++ SLK+VA+ ++SV+QLSR+ERG S L++ F L + +S++EF + 
Sbjct: 10 ILKIIRESKNMSLKEVAAGDISVAQLSRYERGISSLTTOSFYSCLRNMSVSLAEFQYVYH 69 

5 

Query: 67 KYQKSDQISLMSQMAQYHYQRDVAGLEKMISVEEGKLKKDSSDIRCRLNIVLFRGMICEC 126 

y +++ D + L ++++ + ++ LE +++ E ++ +LN ++ R + C 

Sbjct: 70 NYREADDVVLSQKLSEAQRENNIVKLESILAGSEA^QEFPEKKNYKmTIVIRATLTSC 129 

10 Query: 127 DSSRKMSEEDLCFLSDYLFQKDSWEISDYILIGNLYRYYNTRHICQLVKEVINQKEYYRD 186 

+ ++S+ D+ FL+DYLF + W + L N + E+IN+ ++Y + 

Sbjct: 130 NPDYQVSKGDIEFLTDYLFSVEEWGRYELWLFTNSVNLLTLETLETFASEMINRTQFYNN 189 

Query: 187 IYTNRKTWEATLLNVVETLIERRALEEATFFLEKVEALtNNERNA 246 
15 " ' + NR + LLNW IE L+ A FL ++ E + Y R+++ Y K +Y 

Sbjct: 190 LPE^^^RIIKMLLNWSACIENlfflLQVAMKFIJJYIDMTKIPETDLYDRVLIKYHKALYSY 249 

Query: 247 AKGDSRGIQSMKQAIFCFQAIGSKHHVENFQEHFNRV 283 
G+ ++Q + F+ + S +E F R+ 

20 Sbjct: 250 KVGNPHARHDIEQCLSTFEYLDSFGVARKLKEQFERI 286 

A related DNA sequence was identified in S.pyogenes <SEQ ID 627> which encodes the amino acid 
sequence <SEQ ID 628>. Analysis of this protein sequence reveals the following: 

Possible site: 29 
25 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0 .3792 (Affirmative) < suco 

bacterial membrane --- Certainty= 0.0000 (Not Clear) < suco 

30 bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 79/275 (28%) , Positives = 146/275 (52%) , Gaps = 11/275 (4%) 

35 Query: 9 reFRLNRQFSLKQVASNELSVSQLSRFERGESDLSLTKFLGALEAIDLSISEFMDRVNKY 68 

R R +Q S+ +A LS SQ+SRFERGES + + + ++ L L+ ++++I EF+ +K 
Sbjct: 15 RRLRKGKQVSISFLADEYLSKSQISRFERGESEITCSRLLNLLDKuNITIDEFVSAHSKT 74 

Query: 69 QKSDQISLMSQMAQYHYQRDVAGLEKMISVEEGKLKKDSSDIRCRLNIVLFRGMICECDS 128 
40 " + +L+SQ + + +++V L K++ + KD R + +LF DS 

Sbjct: 75 H-THFFTLLSQARKCYAEKNWKLTKLL KDYAHKDYE--RTMIKAILF SIDS 123 

Query: 129 SRKMSEEDLCFLSDYLFQKDSWEISDYIMGNLYRYYNTRHICQLVKEVINQKEYYRDIY 188 
S S+E+L L+DYLF+ + W + IL+GN R+ N + L KE++ Y 
45 Sbjct: 124 SIAPSQEELTRLTDYLFKVEQWGYYEIILLGNCSRFMNYNTLFLLTKEMVASFAYSEQNK 183 

Query: 189 TNRNVVEATLLNWETLIERRALEFATFFLE 248 

TN+ + v 4-N + 1+ E + + + K++ LL +E N Y + + LY G+ + 
Sbjct: 184 TNKMLVTQLSINCLIISIDHSCFEHSRYLINKIDLLLRDEI.NFYEKTVFLYVHGYYKLKQ 243 



50 



Query: 249 GDSRGIQSMKQAIFCFQAIGSKHHVENFQEHFNRV 283 

+ G + M+QA+ F+ +G +++EH+ ++ 

Sbjct: 244 EEMSGEEDMRQALQIFKYLGEDSLYYSYKEHYRQI 278 



55 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 193 

A DNA sequence (GBSx0199) was identified in S.agalactiae <SEQ ID 629> which encodes the amino acid 
sequence <SEQ ID 630>. This protein is predicted to be permease. Analysis of this protein sequence 
60 reveals the following: 
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Possible site: 15 

>» Seems to have no N-terminal signal sequence 



INTEGRAL 


Likelihood 


= 


-8. 


.07 


Transmembrane 


217 


- 233 


( 


215 


- 238) 


INTEGRAL 


Likelihood 


= 


-7. 


,96 


Transmembrane 


163 


- 179 


( 


158 


- 185) 


INTEGRAL 


Likelihood 




-7. 


,75 


Transmembrane 


71 


- 87 


( 


69 


- 91) 


INTEGRAL 


Likelihood 




-7. 


,22 


Transmembrane 


369 


- 385 


( 


356 


- 389) 


INTEGRAL 


Likelihood 




-5. 


.15 


Transmembrane 


279 


- 295 


( 


275 


- 299) 


INTEGRAL 


Likelihood 




-4. 


,88 


Transmembrane 


252 


- 268 


( 


250 


- 270) 


INTEGRAL 


Likelihood 




-4, 


.78 


Transmembrane 


140 


- 156 


( 


139 


- 157) 


INTEGRAL 


Likelihood 




-3. 


.56 


Transmembrane 


343 


- 359 


( 


340 


- 367) 


INTEGRAL 


Likelihood 




-3. 


,13 


Transmembrane 


40 


- 56 


( 


39 


- 56) 


INTEGRAL 


Likelihood 




-2. 


.28 


Transmembrane 


94 


- 110 


< 


92 


- 112) 



Final Results 

bacterial membrane Certainty=0. 4227 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAD36408 GB:AE001788 permease, putative [Thermotoga maritima] 
Identities = 97/396 (24%) , Positives = 194/396 (48%) , Gaps = 15/396 (3%) 

Query: 1 MNINGIKLLSSRAVSKLGDVFYDYGNSTWIASMGGLGQKILGIYQIVELLVSIVLNPFGG 60 

MNN+ S VS+G Y +W+SG+ + G++ I L +I+++PF G 
Sbjct: 1 MNRNLLLFASGSFVSLIGTRIYQVALAWWLYSKTGSSEYV-GLFMISSFLPAI IVSPFAG 59 

Query: 61 ALADRFQRRKILLITDAICAIM CFLLSFIGDDKVMVYGLIVANAILAVSNAFSSPAY 117 

+ DR RR ++++ D + ++ FL+ + + + + L++ +++V ++F +PA 
Sbjct: 60 TWDRHSRRNMMWMDILRGVLFMYLFLMEYFSELTMAL- - LLIVTVLVSVFDSFFNPAV 117 

Query: 118 KSYIPEIVDKADIITYNANLETIVQIISVSSPVLGFLIFNNFGIRITLIVnAITFLISFL 177 

S +P++V K +++ N+ + + + P LG L+ G+ +++++++FLIS + 
Sbjct: 118 DSLLPDLTOKENLVRRNSLYRLLKNLSKILGPALGSLLLKVVGLAGVILINSLSFLISGI 177 

Query: 178 FLYAIKVERVQLSKQEKVAIKNILADIADGFTYIKKEKEIMFFLIIAALLNTFLAMFNYL 237 

F IKVE L K K +N+ DI YI+ + 1+ +++ A++N F + L 

Sbjct: 178 FEMFIKVEEKHLKKVSKE--RMWQDIKSALLYIRSVRFILVTILVIAIMNFFTGSMHVL 235 , 

Query: 238 LP - FTNSLLKTSGAYATILSISAIGS I IGALIARKI - - KSSINSMLSMLVFSSLGVI VMG 294 

LP + L K+ Y T++S+ + G +1 + I ++S+ ++ LV L V V 
Sbjct: 236 LPEHVSKLGKSEWVYGTLMSMLSFGGLI VTFLMATIRTRASVKTLGLNLVGYGLAVFVFA 295 

Query: 295 FPSLFELPIWIPYSGSFLFNSLLTMFNIHFFSQVQIRVDEAYMGRVMSTIFTIAIMFMPI 354 

W+ ++ FL T+FNI+ + +Q+ + E G++ SI ++ +P+ 

Sbjct: 296 MTGNH WLMFAMYFLIGIFQTLFNINVITLLQLAIPEEMRGKIFSLISAVSFSLLPV 351 

Query: 355 GTLFMTIFSFALSNVSFIVIGCAIAILGGLGFSYSK 390 
F S ++ + I GG+ S + 

' Sbjct: 352 SYGFFGFLSSYVATAHIFITTSMALIAGGVLISLQR 387 

A related DNA sequence was identified in S.pyogenes <SEQ ID 63 1> which encodes the amino acid 
sequence <SEQ ID 632>. Analysis of this protein sequence reveals the following: 

Possible site: 45 
>» Seems to have no N-terminal signal sequence 
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Final Results 

bacterial membrane Certainty=0. 4270 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

5 

The protein has homology with the following sequences in the databases: 

>GP:AAD36408 GB:AE001788 permease, putative [Thermotoga maritima] 
Identities = 85/345 (24%) , Positives = 171/345 (48%) , Gaps = 8/345 (2%) 

10 Query: 40 SLSLVAVYQSLESVIGVLFNLFGGVIADSFKRKKIIITTNILCGTACLVLSFLTKEQWLV 99 

S V ++ + ++ + F G + D R+ +++ +IL G + L + h 

Sbjct: 36 SSEYVGLFMISSFLPAIIVSPFAGTWDRHSRRNMMWMDILRGVLFMYLFLMEYFSELT 95 

Query: 100 YAIVL-TNVIl^FMSAFSSPSYKAFTKEIvKKDSISQmSLLETTSWIKVTVPMVAIFL 158 
15 A++L V+++ +F +P+ + ++V+K+++ + NSL + K+ P + h 

Sbjct: 96 MALLLlVTVLVSVFDSFFNPATOSLLPDLWKENLVRANSLYRLLKNLSKILGPALGSLIi 155 

Query: 159 YKLLGIHGVLLLDGLSFLIAALLISFILPVNDEWIKEKVTIREIFNDLKIGFKYVYSHK 218 
K++G+ GV+L++ LSFLI+ + FI +E +K+ R ++ D+K Y+ S + 
20 Sbjct: 156 LKWGLAGVILINSLSFLISGIFEMFIKV--EEKHLKKVSKERNMWQDIKSALLYIRSVR 213 

Query: 219 SIFIITVLSALVNFFIiAAYNLLLPYSNQMFGEISTGLYGTFLTAEAIGGFIGAILSGFVN 278 

I + ++ A++NFF + ++LLP G+ S +YGT ++ + GG I L + 

Sbjct: 214 FILVTILVIAIMNFFTGSMHVLLPEHVSKLGK-SEWVYGTLMSMLSFGGLIVTFLMATIR 272 

25 

Query: 279 KELSSMRLILFLSLSGLMLMIAPPFYIMFHNAIILALSPALFSLFLSIFNIQFFSLVQKD 338 

S LLL GL + + +MN++ L+F ++FNI +L+Q 

Sbjct: 273 TRASVKTLGLNLVGYGLAVFV FAMTGNHWIMFAMYFLIGIFQTLFNINVITLLQLA 328 

30 Query: 339 VDNDFLGRVFGI I FTITILFMPIGTGFFSVftLNPNNSFNLFI IGS 383 

+ + G++F +1 ++ +P+ GFF + + ++FI S 
Sbjct: 329 I PEEMRGKI FSL I SAVSFSLLP VSYGFFGFLSSYVATAHI FITTS 373 

An alignment of the GAS and GBS proteins is shown below: 

35 Identities = 136/379 (35%) , Positives = 229/379 (59%) , Gaps = 6/379 (1%) 

Query: 8 LLSSRAVSKLGDVFYDYGNSTWIASMGGLGQKILGIYQIVELLVSIVIiNPFGGALADRFQ 67 

L+ S+ + ++GDV +D+ N+T++A + ++ +YQ +E ++ ++ N FGG +AD F+ 

Sbjct: 11 LVYSCTIYRIGDVMFDFANNTFLAGLNPASLSLVAVYQSDESVIGVLFNLFGGVIADSFK 70 

Query: 68 RRKILLITDAICAIMCFLLSFIGDDKVMVYGLIVANAILAVSNAFSSPAYKSYIPEIVDK 127 

R+KI++ T+ +C C +LSF+ ++ +VY +++ N I LA +AFSSP+YK++ EIV K 
Sbjct: 71 RKKIIITTNILCGTACLVLSFLTKEQWLVYAIVLTNVILAFMSAFSSPSYKAFTKEIVKK 130 

45 Query: 128 ADIITYNANLETIVQIISVSSPVLGFLIFNNFGIRITLIVDAITFLISFLFLYAIKVERV 187 

I N+ LET +1 V+ P++ ++ GI L++D ++FLI+ L + I 
Sbjct: 131 DSISQLNSLLETTSTVIKVTVPMVAIFLYKLLGIHGVLLLDGLSFLIAALLISFILPVND 190 

Query: 188 QLSKQEKVAIKNILADIADGFTYIKKEKEIMFFLIIAALLNTFLAMFNYLLPFTNSLLK- 246 
50 ++ +EKV 1+ I D+ GF Y+ K I +++AL+N FLA +N LLP++N + 

Sbjct: 191 EWIKEKATTIREIFNDLKIGFKYvYSHKSIFIITvLSALVNFFlAAYNLLLPYSNQMFGE 250 

Query: 247 -TSGAYATILSISAIGSIIGALIARKIK3SINSMLSMLVFSSLGVIVMGFPS---LFELP 302 
++G Y T L+ AIG IGA+++ + ++SM +L S G+++M P +F 
55 Sbjct: 251 ISTGLYGTFLTAEAIGGFIGAILSGFVNKELSSMRLILFLSLSGLMLMLAPPFYIMFHNA 310 

Query: 303 IWIPYSGSFLFNSLLTMFNIHFFSQVQIRVDEAYMGRVMSTIFTIAIMFMPIGTLFMTIF 362 

I + S + LF+ L++FNI FFS VQ VD ++GRV IFTI I+FMPIGT F ++ 
Sbjct: 311 IILALSPA-LFSLFLSIFNIQFFSLVQKDVDNDFLGRVFGIIFTITILFMPIGTGFFSVA 369 



40 



60 



Query: 363 SFALSNVSFIVIGCAIAIL 381 

++ + +IG I L 
Sbjct: 370 LNPNNSFNLFIIGSCITTL 388 



WO 02/34771 



PCT/GB01/04789 



-266- 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 194 

A DNA sequence (GBSx0200) was identified in S.agalactiae <SEQ ID 633> which encodes the amino acid 
sequence <SEQ ID 634>. This protein is predicted to be membrane permease OpuCD. Analysis of this 
protein sequence reveals the following: 
Possible site: 46 

>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -5.68 Transmembrane 91 - 107 ( 88 - 110) 
INTEGRAL Likelihood = -4.30 Transmembrane 15 - 31 ( 9 - 37) 
INTEGRAL Likelihood = -3.72 Transmembrane 72 - 88 ( 72 - 88) 
INTEGRAL Likelihood = -3.19 Transmembrane 124 - 140 ( 123 - 142) 

Final Results 

bacterial membrane Certainty=0 . 3272 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 8509> which encodes amino acid sequence <SEQ ID 8510> 
was also identified. Analysis of this protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 1 
McG: Discrim Score: -10.69 
GvH: Signal Score (-7.5): -3.79 

Possible site: 39 
>>> Seems to have no N-terminal signal sequence 



M program 


count: 5 value: 


-9. 


.02 threshold: 


0.0 








INTEGRAL 


Likelihood 


= -9. 


,02 


Transmembrane 


35 - 


51 


( 25 


- 53) 


INTEGRAL 


Likelihood 


= -5. 


.68 


Transmembrane 


151 - 


167 


( 148 


- 170) 


INTEGRAL 


Likelihood 


= -4. 


,30 


Transmembrane 


75 - 


91 


( 69 


- 97) 


INTEGRAL 


Likelihood 


= -3. 


,72 


Transmembrane 


132 - 


148 


( 132 


- 148) 


INTEGRAL 


Likelihood 


= -3. 


.19 


Transmembrane 


184 - 


200 


( 183 


- 202) 


PERIPHERAL 


Likelihood 


= 2. 


.17 


58 











modified ALOM score: 2.30 

*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0. 4609 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAF91342 GB:AF249729 membrane permease OpuCD [Listeria monocytogenes] 
Identities = 104/154 (67%) , Positives = 133/154 (85%) 

Query: 3 IANVIQTI PSLAMIS I IMLGLGLGI KTWATVFLYSLLPI ITNTYTGIRNVDSDLLDAAK 62 

IAN+IQTIP+LAM++++ML +GLG TW ++FLYSLLPI+ NTYTGIRNVD LL++ K 
Sbjct: 60 IANIIQTIPAIAMLAVLMLIMGLGTNTWLSLFLYSLLPILKNTYTGIRNVDGALLESGK 119 

Query: 63 GMGMTKRQRLFMVELPLSI SVIMAGLRNALVVAIGITAIGAFVGGGGLGDI I IRGTNATN 122 

MGMTK Q L ++E+PL++SVIMAG+RNALV+AIG+ AIG FVG GGLGDII+RGTNATN 
Sbjct: 120 AMGMTKWQVLRLIEMPLALSVIMAGIRNALVIAIGVAAIGTFVGAGGLGDIIVRGTNATN 179 

Query: 123 GGAI ILAGSLPTALMAI FSDLILGGIQRMLEPRK 156 

G AIILAG++PTA+MAI +D++LG ++R L P K 
Sbjct: 180 GTAIILAGAIPTAVMAILADVLLGWVERTLNPVK 213 



A related DNA sequence was identified in S.pyogenes <SEQ ID 635> which encodes the amino acid 
sequence <SEQ ID 636>. Analysis of this protein sequence reveals the following: 
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Possible site: 49 
>» Seems to have no N-terminal signal sequence 



INTEGRAL 


Likelihood 




-9. 


.24 


Transmembrane 


39 


- 55 


( 


31 


- 59) 


INTEGRAL 


Likelihood 




-7. 


.17 


Transmembrane 


190 


- 206 


( 


188 


- 211) 


INTEGRAL 


Likelihood 




-4. 


.62 


Transmembrane 


93 


- 109 


( 


75 


- 110) 


INTEGRAL 


Likelihood 




-3. 


.66 


Transmembrane 


76 


- 92 


( 


75 


- 92) 


INTEGRAL 


Likelihood 




-2. 


.87 


Transmembrane 


221 


- 237 


( 


220 


- 237) 


INTEGRAL 


Likelihood 




-2. 


,44 


Transmembrane 


168 


- 184 


( 


165 


- 184) 



Final Results 

bacterial membrane -■ 

bacterial outside -■ 

bacterial cytoplasm -• 



- Certainty=0. 4694 (Affirmative) < suco 

- Certainty=0. 0000 (Not Clear) < suco 

- Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

>GP:AAD45530 GB:AF162656 choline transporter [Streptococcus pneumoniae] 
Identities = 344/508 (67%) , Positives = 425/508 (82%) , Gaps = 2/508 (0%) 

Query: 13 MPSLFOTFQNRFNEWLAALGEHLQISLLSLMIALLIGVPLAALLSRSKRWSDIMLQVTGV 72 

M +L TFQ+RF++WL AL +HLQ+SLL+L++A+L+ +PLA L ++ +D +LQ+ G+ 
Sbjct: 1 MTNLIATFQDRFSDWLTALSQHLQLSLLTLLIAILLAIPLAVFLRYHEKLADWVLQIAGI 60 

Query: 73 FQTIPSLALLGLFIPLMGIGTLPAVTALVIYAIFPILQNTITGLNGIDPSLVEAGIAFGM 132 

FQTIPSLALLGLFIPLMGIGTLPA+TALVIYAIFPILQNTITGL GIDP+L EAGIAFGM 
Sbjct: 61 FQTIPSLALLGLFIPLMGIGTLPALTALVIYAIFPILQNTITGLKGIDPNLQEAGIAFGM 120 

Query: 133 TKWERLKTFEIPIAMPVIMSGVRTSAvMIIGTATLASLIGAGGLGSFILLGIDRNNANLI 192 

T+WERLK FEIP+AMPVIMSG+RT+AV+IIGTATLA+LIGAGGLGSFILLGIDRNNA+LI 
Sbjct: 121 TRVffiRLKKFEIPLAMPVIMSGIRTAAVLIIGTATIiAALIGAGGLGSFILLGIDRNNASLI 180 

Query: 193 LIGAISSALIAIlFNSLLQYLEKASI^IMISFGITLIiALLASYTPMALSQFSKGKDTW 252 

LIGA+SSA+LAI FN LL+ +EKA LR I F + L L SY+P L Q K K+ +V 
Sbjct: 181 LIGALSSAVLAIAFNFLLKVMEKAKLRTIFSGFALVALLLGLSYSPALLVQ- - KEKENLV 238 

Query: 253 IAGKLGAEPDILINLYKELIEDQSDISVELKSNFGKTSFLYEALKSGDIDMYPEFTGTIT 312 

1AGK+G EP+IL N+YK LIE+ + ++ +K NFGKTSFLYEALK GDID+YPEFTGT+T 
Sbjct: 239 IAGKIGPEPEIIiANMYKLLIEENTSMTATVKPNFGKTSFLYEALKKGDIDIYPEFTGTVT 298 

Query: 313 SSLLRDKPPLSNDPKQVYEDAKKGIAKQDKLTLLKPFAYQNTYAVAMPEKLAKEYQIETI 372 

SLL+ P +S++P+QVY+ A+ GIAKQD L LKP +YQNTYAVA+ P+ K+A+EY ++TI 
Sbjct: 299 ESLLQPSPKVSHEPEQVYQVARDGIAKQDHLAYLKPMSYQNTYAVAVPKKIAQEYGLKTI 358 

Query: 373 SDLKAHADTLKAGFTLEFKDRADGYKGMQSQYGLQLSVATMEPALRYQAIQSGDIQvTDA 432 

SDLK LKAGFTLEF DR DG KG+QS YGL L+VAT+EPALRYQAIQSGDIQ+TDA 

Sbjct: 359 SDLKKVEGQLKAGFTLEFNDREDGNKGLQSiyryGLKLNVATIEPALRYQAIQSGDIQITDA 418 

Query: 433 YSTDAEITKYHLKVLKDDKQLFPPYQGAPLMKTSLLTKHPELKGILNQLAGKITEKEMQD 492 

YSTDAE+ +Y L+VL+DDKQLFPPYQGAPLMK +LL KHPEL+ +LN LAGKITE +M 
Sbjct: 419 YSTDAELERYDLQVLEDDKQLFPPYQGAPLMKEALLKKHPELERVLNTLAGKITESQMSQ 478 



Query: 493 MNYEVSVKGADANKVARDYLLKTGLIQK 520 

+NY+V V+G A +VA+++L + GL++K 
Sbjct: 479 LNYQVGVEGKSAKQVAKEFLQEQGLLKK 506 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 53/148 (35%) , Positives = 93/148 (62%) , Gaps = 1/148 (0%) 

Query: 3 IAWIQTIPSLAMISIIMLGLGLGIKTVVATVFLYSLLPIITNTYTGIRNVDSDLLDAAK 62 

+ V QTIPSLA++ + + +G+G V + +Y++ PI+ NT TG+ +D L++A 
Sbjct: 69 VTGVFQTIPSLALLGLFIPLMGIGTLPAVTALVIYAIFPILQNTITGLNGIDPSLVEAGI 128 

Query: 63 GMGMTKRQRLFMVELPLSISVIMAGLRNALVVAIGITAIGAFVGGGGLGDIIIRGTNATN 122 

GMTK +RL E+P+++ VIM+G+R + V+ IG + + +G GGLG 1+ G + N 
Sbjct: 129 AFGMTKWERLKTFEIPIAMPVIMSGVRTSAVMIIGTATLASLIGAGGLGSFILLGIDRNN 188 



Query: 123 GGAI ILAGSLPTALMAIFSDLILGGIQR 150 
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+IL G++ +AL+AI + +L +++ 
Sbjct: 189 AN-LILIGAI SSALLAI I FNSLLQYLEK 215 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
5 vaccines or diagnostics. 

Example 195 

A DNA sequence (GBSx0201) was identified in S.agalactiae <SEQ ID 637> which encodes the amino acid 
sequence <SEQ ID 63 8>. This protein is predicted to be choline transporter-related. Analysis of this protein 
sequence reveals the following: 

10 Possible site: 44 

>>> May be a lipoprotein 

INTEGRAL Likelihood = -3.03 Transmembrane 306 - 322 ( 306 - 327) 

Final Results 

15 bacterial membrane Certainty=0. 2211 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9517> which encodes amino acid sequence <SEQ ID 9518> 
20 was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB15386 GB:Z99121 glycine betaine/carnitine/choline ABC 

transporter (osmoprotectant-binding protein) [Bacillus subtilis] 
Identities = 168/303 (55%), Positives = 224/303 (73%), Gaps = 1/303 (0%) 

25 

Query: 2 LKKSHFLQIFTLCLALLTISGCQLTDTKKSGHTTIKVAAQSSTESSIMANIITELIHHEL 61 

+ K +L F L +L + GC L + TIK+ AQS TES I+AN+I +LI H+ 

Sbjct: 1 MTKIKWLGAFALVFVML-LGGCSLPGLGGASDDTIKIGAQSMTESEIVANMIAQLIEHDT 59 

30 . Query: 62 GYNTTL I SNLGS STVTHQALLRGDADI AATRYTGTD ITGTLGLKAVKDPKEASKI VKTEF 121 

NT L+ NLGS+ V HQA+L GD DI+ATRY+GTD+T TLG +A KDPK+A IV+ EF 
Sbjct: 60 DIOTALVKNLGSFTVQHQAMLGGDIDISATRYSGTDLTSTLGKEAEKDPKKALNIVQNEF 119 

Query: 122 QKKYNQTWYPTYGFSDTYAFMVTKEFARQNKITKISDLKKLSTTMKAGVDSSWMNREGDG 181 
35 QKR++ W+ +YGF +TYAF VTK+FA + I +SDLKK ++ K GVD++W+ R+GDG 

Sbjct: 120 QKRFSYKWFDSYGFDNTYAFTVTKKFAEKEHINTVSDLKKNASQYKLGVDNAWLKRKGDG 179 

Query: 182 YTDFAKTYGFEFSHIYPMQIGLVYDAVESNKMQSVLGYSTDGRISSYDLEILRDDKKFFP 241 
Y F TYGFEF YPMQIGLVYDAV++ KM +VL YSTDGRI +YDL+IL+DDK+FFP 
40 Sbjct: 180 YKGFVSTYGFEFGTTYPMQIGLVYDAVKNGKMDAVLAYSTDGRIKAYDLKILKDDKRFFP 239 

Query: 242 PYFASMVVNNSIIKKDPKLKKLLHRLDGKIl^KTMQNLNYMVDDICLLEPSVVAKQFLEKN 301 

PY+ S V+ ++K+ P+L+ ++++L G+I+ +TMQ LNY VD KL EPSWAK+FLEK+ 
Sbjct: 240 PYDCSPVIPEKVLKEHPELEGVINKLIGQIDTETMQEIJSJYEVDGKLKEPSVVAKEFLEKH 299 



45 



Query: 302 HYF 304 
HYF 

Sbjct: 300 HYF 302 



50 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

A related GBS gene <SEQ ID 851 1> and protein <SEQ ID 8512> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: 22 Crend: 5 
55 McG: Discrim Score: 10.26 

GvH: Signal Score (-7.5): -4.19 
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Possible site: 44 
>» May be a lipoprotein 

ALOM program count: 0 value: 8.65 threshold: 0.0 
PERIPHERAL Likelihood =8.65 66 
modified ALOM score: -2.23 

Final Results 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

56.3/75.4% over 287aa 

Bacillus subtilis 

EGAD | 109208 | glycine betaine/carnitine/choline ABC Insert characterized 

SP|032243|OPCC_BACSU GLYCINE BETAINE/CARNITINE/CHOLINE-BINDING PROTEIN PRECURSOR 
(OSMOPROTECTANT-BINDING 
PROTEIN) . Insert characterized 

GP|2635894|emb|CAB15386.l| |Z99121 glycine betaine/carnitine/choline ABC transporter 
(osmoprotectant -binding protein) Insert characterized 

PIR|E69670|E69670 glycine betaine/carnitine/choline ABC transporter (osmoprotec) opuCC - 
Insert characterized 

ORF0118K349 - 1212 of 1524) 

EGAD| 109208|BS3376 (15 - 302 of 303) glycine betaine/carnitine/choline ABC {Bacillus 
subtilis} SP|032243|OPCC_BACSU GLYCINE BETAINE/ CARNITINE/ CHOLINE-BINDING PROTEIN PRECURSOR 
(OSMOPROTECTANT-BINDING PROTEIN). GP | 2635894 | emb | CAB15386 . 1 1 | Z99121 glycine 

betaine/carnitine/choline ABC transporter (osmoprotectant -binding protein) {Bacillus 
subtilis} PIR|E69670|E69670 glycine betaine/carnitine/choline ABC transporter (osmoprotec) 
opuCC - Bacillus subtilis 
%Match =33.5 

%Identity =56.2 %Similarity = 75.3 

Matches = 162 Mismatches =71 Conservative Sub.s = 55 

162 192 222 252 282 - 312 342 372 

VWFFLIVF*QCLIFIFSVRYKSGSMKRIWGVXXN*LXXITGNSSNAQ 

MTKIKWLGAFALVFVMLLGGCS 
10 20 

402 432 462 492 522 552 582 612 

LTDTKKSGHTTIKVAAQSSTESSIMANIITELIHHELGYNTTLISNLGSSTVTHQALLRGDADIAATRYTGTDITGTLGL 

| : |||: HI III [:||:| =11 1 = II 1= 1111= I 111 = 1 II 11=1111=111=1 HI 

LPGLGGASDDTIKIGAQSMTESEIVANMIAQLIEHDTDLNTALVICNIX3SNYVQHQAMLGGDIDISATRYSGTDLTSTLGK 

40 50 60 70 80 90 100 

642 672 702 732 762 792 822 852 

KAVTOPKFASKIVKTEXQKRYNQTWYPTYGFSDTYAFMVTKEF 

:| ||||:| ||: | |||:= |: =111 =1111 111 = 11 = I =11111 == I 111-1= 1 = 1111 
FJU3KDPKKALNIVQNEFQKRFSYKWFDSYGFDNTYAFTVTKKFAEKI3HI 

120 130 140 150 160 170 180 

. 882 912 942 972 1002 1032 1062 1092 

FAKTYGFEFSHIYPMQIGLVYDAVESNKMQSVLGYSTDGRISSYDLEILRDDKKFFPPYEASMVVNNSIIKKDPKLKKLL 

i nun iiiiiiiiiiii = = ii =n mini =111 = 11 = 111 = 11111= 1 1= = = 1= 1 = 1= = = 

FVSTYGFEFGTTYPMQIGLVYDAVKNGKMDAVLAYSTDGRIKAYDLKILKDDKRFFPPYDCSPVIPEKVLKEHPELEGVI 
200 210 220 230 240 250 260 

1122 1152 1182 1212 1242 1272 1302 1332 

HRLDGKINLKTMQNLNYMVDDKLLEPSVVAKQFL 

==l 1=1= =111 III II II 1111111=1111=111 
NKLIGQIDTETMQELNYEVDGKLKEPSWAKEFLEKHHYFD 

280 290 300 

SEQ ID 8512 (GBS23) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 14 (lane 8; MW 35kDa). 



WO 02/34771 



PCT/GB01/04789 



-270- 



The GBS23-His fusion product was purified (Figure 194, lane 9) and used to immunise mice. The resulting 
antiserum was used for Western blot (Figure 251). These tests confirm that the protein is immunoaccessible 
on GBS bacteria. 

Example 196 

A DNA sequence (GBSx0202) was identified in S.agalactiae <SEQ ID 639> which encodes the amino acid 
sequence <SEQ ID 640>. This protein is predicted to be membrane permease OpuCB (opuBB). Analysis of 
this protein sequence reveals the following: 

Possible site: 34 

»> Seems to have no N-terminal signal sequence 



INTEGRAL 


Likelihood = 


-9. 


.66 


Transmembrane 


25 - 


41 


( 18 


- 45) 


INTEGRAL 


Likelihood = 


-7. 


.96 


Transmembrane 


182 - 


198 


( 174 


- 202) 


INTEGRAL 


Likelihood = 


-4, 


.83 


Transmembrane 


61 - 


77 


( 57 


- 95) 


INTEGRAL 


Likelihood = 


-4, 


,09 


Transmembrane 


78 - 


94 


( 78 


- 95) 


INTEGRAL 


Likelihood = 


-1. 


.22 


Transmembrane 


134 - 


150 


( 134 


- 150) 



The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAF91340 GB:AF249729 membrane permease OpuCB [Listeria 
monocytogenes] 
Identities = 121/208 (58%) , Positives = 160/208 (76%) 

Query: 1 MVNFLSQYGMQILVKTWEQVYISFFAIALGIAIAVPIiGVVLTRFPKVAKIIIAIASMLQT 60 

+V F + G +LV+TW+ ++IS A+ LGIA+AVP G++LTR PKVA +1 + S+LQT 
Sbjct: 4 IWFFQENGHNLLVQTWQHLFISLSAVILGIAVATOTGILLTRSPKVANFVIGWSVLQT 63 

Query: 61 IPSLALLALMIPLFGIGKIPAIVALFIYSLLPILRNTYIGMNNVNPTLKDCAKGMGMKPI 120 

+PSLA+LA +IP G+G +PAI+ALFIY+LLPILRNT+IG+ V+ L + +GMGM 
Sbjct: 64 VPSLAILAFI I PFLGVGTLPAI IALFI YALLP I LRNTFIGVRGVDKNLIESGRGMGMTNW 123 

Query: 121 QSIFQVELPLATPIIMAGIRLSTIYVIAWATLASYIGAGGLGDLIFSGLNLFQSKLILGG 180 

Q I VE+P + +IMAGIRLS +YVIAWATLASYIGAGGLGD IF+GLNL++ LILGG 
Sbjct: 124 QLIVNVEIPNSISVIMAGIRLSAVYVIAWATLASYIGAGGLGDFIFNGLNLYRPDLILGG 183 

Query: 181 TI PVI ILSLI IDYLLGLLETALTPRTTR 208 

IPV IL+L++++ LG LE LTP+ R 
Sbjct: 184 AIPVTILALWEFALGKLEYRLTPKAIR 211 

A related GBS gene <SEQ ID 8513> and protein <SEQ ID 8514> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 0 
McG: Discrim Score: -9.08 
GvH: Signal Score (-7.5): -1.86 

Possible site: 37 
>» Seems to have no N-terminal signal sequence 
ALOM program count: 5 value: -8.60 threshold: 0.0 



INTEGRAL 


Likelihood = 


-8, 


.60 


Transmembrane 


25 - 


41 


( 


18 - 


45) 


INTEGRAL 


Likelihood = 


-7. 


,96 


Transmembrane 


182 - 


198 


( 


174 - 


202) 


INTEGRAL 


Likelihood = 


-4. 


,83 


Transmembrane 


61 - 


77 


( 


57 - 


95) 


INTEGRAL 


Likelihood = 


-4 


.09 


Transmembrane 


78 - 


94 


( 


78 - 


95) 


INTEGRAL 


Likelihood = 


-1 


,22 


Transmembrane 


134 - 


150 


( 


134 - 


150) 


PERIPHERAL 


Likelihood = 


2 


.70 


156 













modified ALOM score: 2.22 



Final Results 



bacterial membrane 
bacterial outside 
bacterial cytoplasm 



- Certainty=0 . 4864 (Affirmative) < suco 

- Certainty=0. 0000 (Not Clear) < suco 

- Certainty=0. 0000 (Not Clear) < suco 



*** Reasoning Step: 3 
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25 



Final Results 

bacterial membrane Certainty=0 .4439 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

5 

The protein has homology with the following sequences in the databases: 

ORF01825(301 - 927 of 1233) 

GP|9651976|gb|AAF91340.l|AF249729_2|AF249729(4 - 212 of 218) membrane permease OpuCB 
{Listeria monocytogenes} 
10 %Match =30.2 

%Identity = 57.9 %Similarity = 79.9 

Matches = 121 Mismatches = 42 Conservative Sub.s = 46 

117 147 177 207 237 267 297 327 

15 STCF*YLKTY*FLCYGRRLT*KYC*AYFKTWFKIRSSC*E*E*LKGHCYSCIPS*YVIRYYLGRY*NGGSIMVNFLSQYG 

=1 |: = I 
MDAI VTFFQENG 
10 

20 357 387 417 447 477 507 537 567 

MQILVKTWEQWISFFAIALGIAlAVPXGvVLTRFPKVAKIIIAIASMLQTIPSLALLALMIPLFGIGKIPAIVALFIYS 



HNLLVQTWQHLFISLSAVILGIAVAVPTGILLTRSPKVANFVIGWSVLQTVPSLAILAFIIPFLGVGTLPAIIALFIYA 
30 40 50 60 70 80 90 



597 627 657 687 717 747 777 807 

LLPILRNTYIGMNNVNPTLKDCAKGMGMKPIQSIFQVELPIiATPIIMAGIRLSTIYVIAWATLASYIGAGGLGDLIFSGL 

IIIIIIIHI: 1 = I : :|||| I I 11 = 1 = =11111111 = I I I I I I I I I I I I I I I I I I I = I I = I I 
LLPILRNTFIGWGvDKNLIESGRGMGMTNWQLIVNvEIPNSISVIMAGIRLSAVYVIAWATIASYIGAGGLGDFIFNGL 
30 110 120 130 140 150 160 170 

837 867 897 927 957 987 1017 1047 

NLFQSKLILGGTIPVIILSLIIDYLLGLLETALTPRTTRRFA*ICLKNRTFYRYLHFA*PS*RFLVVN*PILKSLVIPQL 

ll = = Mill III 11 = 1 = = = = II II llh I 
35 NLYRPDLILGGA1PVTILALWEFALGKLEYRLTPKAIREAREGGE 

190 200 210 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 197 

40 A DNA sequence (GBSx0203) was identified in S.agalactiae <SEQ ID 641> which encodes the amino acid 
sequence <SEQ ID 642>. Analysis of this protein sequence reveals the following: 

Possible site: 46 

»> Seems to have no N-terminal signal sequence 

45 Final Results 

bacterial cytoplasm Certainty=0. 3531 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

50 The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAF91339 GB:AF249729 ATPase OpuCA [Listeria monocytogenes] 
Identities = 230/380 (60%) , Positives = 298/380 (77%) , Gaps = 4/380 (1%) 

Query: 6 IIEYQNINKVY-GENVAvEDINLKIYPGDFVCFIGTSGSGKTTLMRMVNHMLKPTNGTLL 64 
55 +++++++ K Y G AV D+ L I G+FVCFIG SG GKTT M+M+N +++PT G + 

Sbjct: 1 MLKFEHvTKTYKGGKKAvNDLTLNIDKGEFVCFIGPSGCGKTTTMKMINRLIEPTEGKIF 60 

Query: 65 FKGKD I ST INPIELRRRIGYV1QNIGLMPHMTIYENIVLVPKLLKWSEEAKRAKARELI K 124 
KDI +P++LRR IGYVIQ IGLMPHMTI ENIVLVPKLLKWSEE K+ +A+ELIK 
60 Sbjct: 61 INDKDIMAEDPVKLRRSIGYVIQQIGLMPHMTIRENIVLVPKLLKWSEEICKQERAKELIK 120 



WO 02/34771 



PCT/GB01/04789 



-272- 

Query: 125 LVELPEEYLDRYPSELSGGQQQRIGVIRM^AADQDIILMDEPFGALDPITREGIQDLVKS 184 

LV+LPEE+LDRYP ELSGGQQQRIGV+RALAA+Q++ILMDEPFGALDPITR+ +Q+ K+ 
Sbjct: 121 LVDLPEEFLDRYPYELSGGQQQRIGVLRALAftEQNLILMDEPFGALDPITRDSLQEEFKN 180 

Query: 185 LQEEMGKTIILVTHDMDEALKLATKIIVMDNGKMVQEGTPNDLLHHPATSFVEQMIGEER 244 

LQ+E+GKTII VTHDMDEA+KLA +I++M +G++VQ TP+++L +PA SFVE IG++R 
Sbjct: 181 LQKELGKTI I FVTHDMDEAI KLADRIVIMKDGEIVQFDTPDE I LRNPANS FVEDFIGKDR 240 

Query: 245 LLHAQADITPVKQIMLNNPVSITAEKTLTEAITLMRQKRVDSLLVTDNGKLI-GFIDLES 303 

L+ A+ D+T V QIM NPVSITA+K+L AIT+M++KRVD+LLV D G ++ GFID+E 
Sbjct: 241 LIEAKPDOTQVAQINnsn^PVSITADKSLGi^IT\7MKEKRVDTLLVVDEGNVLKGFIDVEQ 300 

Query: 304 LSSKYKKDRLVSDILKHTDFYVMEDDLLROTAERILKLGLKYAPVVDHENliJlKGIVTRAS 363 

+ + V DI++ FYV ED LLR+T +RILK G KY PWD + L GIVTRAS 
Sbjct: 301 IDIJSIRRTATSVMDIIEKNVFYVYEDTLLRDTVQRILKRGYKYIPVVDKDKRLVGIVTRAS 360 

Query: 364 LVDMLYDIIWGDTE- -TEDQ 381 

LVD++YD IWG E TE+Q 
Sbjct: 361 LVD IVYDS IWGTLEDATENQ 380 

A related DNA sequence was identified in S.pyogenes <SEQ ID 643> which encodes the amino acid 
sequence <SEQ ID 644>. Analysis of this protein sequence reveals the following: 

Possible site: 39 

>» Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 3619 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 102/237 (43%) , Positives = 165/237 (69%) , Gaps = 1/237 (0%) 

Query: 6 IIEYQNINKVYGENVAVEDINLKIYPGDFVCFIGTSGSGKTTLMRMVNHMLKPTNGTLLF 65 

+1 + N++K +G+ +++ +1 +F +G SGSGKTTL++M+N +++P++G +L 
Sbjct: 1 MIRFNNVSKTFGQTKVLQEQTFQINDREFFVLVGPSGSGKTTLLKMINCLIEPSSGDILL 60 

Query: 66 KGKDISTINPIELRRRIGYVIQNIGLMPHMTIYENIVLVPKLLKWSEEAKRAKARELIKL 125 

+ ++ E+R IGYV+Q I L P++T+ ENI ++P++ +WS E R K EL+ 
Sbjct: 61 NNVPQTELDLREMRLS IGYVLQQI ALFPNLTVAENIAI I PEMKQWSAEEIRQKTEELLDK 120 

Query: 126 VELP-EEYLDRYPSELSGGQQQRIGVIRALAADQDIILMDEPFGALDPITREGIQDLVKS 184 

V LP + +YLDRYPS +LSGG+QQRIG+ +RA+ + I+LMDEPF ALDPI+R+ +Q+L+ S 
Sbjct: 121 VGLPAKDYLDRYPSDLSGGEQQRIGIVRAIISHPKILLMDEPFSALDPISRKQLQELMLS 180 

Query: 185 LQEEMGKTIILVTHDMDEALKLATKIIWIDNGKMVQEGTPNDLLHHPATSFVEQMIG 241 

L +E TI+ VTHD+DEA+KL ++ +++ G++VQ P + HPA +FV + G 
Sbjct: 181 LHKEFDMTIVFVTHDIDEaiKLGDRVAILNEGEIVQLDRPEMIKTHPANAFWNLFG 237 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 198 

A repeated DNA sequence (GBSx0212) was identified in S.agalactiae <SEQ ID 645> which encodes the 
amino acid sequence <SEQ ID 646>. Analysis of this protein sequence reveals the following: 

Possible site: 24 

»> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 4736 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 
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bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could he useful antigens for 
vaccines or diagnostics. 

Example 199 

A DNA sequence (GBSx0213) was identified in S.agalactiae <SEQ ID 647> which encodes the amino acid 
sequence <SEQ ID 648>. Analysis of this protein sequence reveals the following: 

Possible site: 38 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.06 Transmembrane 18- 34 ( 18- 34) 

Final Results 

bacterial membrane Certainty=0 . 1426 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.pyogenes. 

A related GBS gene <SEQ ID 8515> and protein <SEQ ID 8516> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: 20 Crend: 5 

Sequence Pattern: CQMN 
SRCFLG: 0 

McG: Length of UR: 19 

Peak Value of UR: 2.60 

Net Charge of CR: 3 
McG: Discrim Score: 7.77 
GvH: Signal Score (-7.5) : -4.89 

Possible site: 25 
»> May be a lipoprotein 

Amino Acid Composition: calculated from 21 
ALOM program count: 0 value: 13.21 threshold: 0.0 
PERIPHERAL Likelihood = 13.21 115 
modified ALOM score: -3.14 

*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

ORF01527(346 - 465 of 1095) 

EGAD | 7398 | 7198 (2 - 41 of 47) lysis protein for colicin e9 precursor {Escherichia coli} 
EGAD|41475 |43808 lysis protein { } SP | P13344 | LYS5_EC0LI LYSIS PROTEIN FOR COLICIN E5 
PRECURSOR. GP|40543 | emb | CAA33861 . 1 | |X15857 lysis protein (AA 1-47) {Enterobacteriaceae} 
GP| 144373 |gb|AAA98053.1 | |M30445 colicin release protein {Plasmid ColE5-099} 
PIR| JQ0330 | JQ0330 colicin E5 lysis protein precursor - Escherichia coli plasmid ColE5-099 
%Match =3.7 

%Identity =35.0 %Similarity =52.5 

Matches = 14 Mismatches = 19 Conservative Sub.s = 7 

135 165 195 225 255 285 315 345 

YIYFFHCRRIYI I ININY* FN*GI *NIQMIFCLHVKTKTIKIRENFVILKLIL*CW* I IVNFI IYLI YKI YILRKENMMR 
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375 405 435 465 495 525 555 585 

5 KYIKWLIPISIFGMILGGCQMNSEHKIQSISffivTCNSKQSEV^ 

I I hi : = =11 II I I :| I I :|: 
KKITWIILLLLAAIILAACQANYIHDVQGGTVSPSSSAELTGLATQ 
20 30 40 

10 SEQ ID 8516 (GBS389) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 74 (lane 6; MW 18kDa). 

The GBS389-His fusion product was purified (Figure 214, lane 4) and used to immunise mice. The 
resulting antiserum was used for FACS (Figure 313), which confirmed that the protein is immunoaccessible 
on GBS bacteria. 

15 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 200 

A DNA sequence (GBSx0214) was identified mS.agalactiae <SEQ ID 649> which encodes the amino acid 

sequence <SEQ ID 650>. Analysis of this protein sequence reveals the following: 

20 Possible site: 19 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3766 (Affirmative) < suco 

25 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

30 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 201 

A DNA sequence (GBSx0215) was identified in S.agalactiae <SEQ ID 651 > which encodes the amino acid 
sequence <SEQ ID 652>. Analysis of this protein sequence reveals the following: 

35 Possible site: 46 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3882 (Affirmative) < suco 

40 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

45 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 202 

A DNA sequence (GBSx0216) was identified in S.agalactiae <SEQ ID 653> which encodes the amino acid 
sequence <SEQ ID 654>. This protein is predicted to be lectin, alpha subunit precursor. Analysis of this 
protein sequence reveals the following: 

5 Possible site: 47 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0653 (Affirmative) < suco 

10 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

15 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 203 

A DNA sequence (GBSx0217) was identified in S.agalactiae <SEQ ID 655> which encodes the amino acid 

sequence <SEQ ID 656>. Analysis of this protein sequence reveals the following: 

20 Possible site: 41 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 6569 (Affirmative) < suco 

25 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

30 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 204 

A DNA sequence (GBSx0218) was identified in S.agalactiae <SEQ ID 657> which encodes the amino acid 
sequence <SEQ ID 658>. Analysis of this protein sequence reveals the following: 

35 Possible site: 27 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 5736 (Affirmative) < suco 

40 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

45 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 205 

A DNA sequence (GBSx0219) was identified in S.agalactiae <SEQ ID 659> which encodes the amino acid 
sequence <SEQ ID 660>. Analysis of this protein sequence reveals the following: 

Possible site: 52 

>>> Seems to have an uncleavable N-term signal seq 
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Final Results 

bacterial membrane Certainty=0. 6243 (Affirmative) < suco 

15 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 8517> which encodes amino acid sequence <SEQ ID 8518> 
was also identified. 

20 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 206 

25 A DNA sequence (GBSx0220) was identified in S.agalactiae <SEQ ID 661> which encodes the amino acid 
sequence <SEQ ID 662>. Analysis of this protein sequence reveals the following: 

Possible site: 37 

>>> Seems to have no N- terminal signal sequence 

30 Final Results 

bacterial cytoplasm Certainty=0 .2374 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

35 The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAB89623 GB:AE000990 repressor protein [Archaeoglobus 
fulgidus] 

Identities = 34/62 (54%) , Positives = 46/62 (73%) 

40 Query: 11 LKQVREDIGMTQQELAIRIGVRRETIGHLENNRYNPSLEMALKIVKIFDMKIEDIFQLRK 70 

+K+ R MTQ+ELA R+GVRRETI LE +YNPSL++A KI ++F+ KIEDIF + 
Sbjct: 5 IKEFRAKFNMTQEELAKRVGVRRETIVFLEKGKYNPSLKLAYKIARVFNAKIEDIFIFDE 64 

Query: 71 ED 72 
45 E+ 

Sbjct: 65 EE 66 

There is also homology to SEQ ID 412. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
50 vaccines or diagnostics. 
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Example 207 

A DNA sequence (GBSx0221) was identified in S.agalactiae <SEQ ID 663> which encodes the amino acid 
sequence <SEQ ID 664>. Analysis of this protein sequence reveals the following: 

Possible site: 36 

>>> Seems to have no N- terminal signal sequence 



The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB61817 GB:AL133236 putative acetyl transferase [Streptomyces 
coelicolor A3 (2) ] 

Identities = 30/97 (30%) , Positives = 52/97 (52%) , Gaps = 1/97 (1%) 

Query: 82 VGMLNIOTLARADMQWGELGYVFHNQFWSNGYAFESILALLNSTYEKLGFHHIEAQITPG 141 

VGM ++ + Q GE+ Y+ H + W G E +LL+ +++ G H I A P 
Sbjct: 72 VGMGDLHVRSHTQRQ-GEISYI VHPRVWGQGIGTEIGRSLLSLGFDRWGLHRIRATCDPR 130 

Query: 142 NERSEKLVRRLGLTYETTRKDFSFENGKWTDKLIYSI 178 

N+ S +4-+ +LG+TYE + ++ WD L++SI 
Sbjct: 131 NQASSRVLTKLGMTYEGRHRHTAWIRDGWRDSLVFSI 167 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 208 

A DNA sequence (GBSx0222) was identified in S.agalactiae <SEQ ID 665> which encodes the amino acid 
sequence <SEQ ID 666>. This protein is predicted to be p20 protein. Analysis of this protein sequence 
reveals (he following: 

Possible site: 44 

>>> Seems to have no N- terminal signal sequence 



Final Results 



bacterial cytoplasm Certainty=0. 3794 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



Final Results 



bacterial cytoplasm Certainty=0 . 1044 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAA30415 GB:X07542 P20 (AA 1-178) [Bacillus lichenif ormis] 
Identities = 56/175 (32%) , Positives = 94/175 (53%) , Gaps = 6/175 (3%) 



Sbjct: 



Query: 



16 TVLTERLRLQPVELTNVM)FLEFSSDSEWFYMQRYKANTvEFAQVVLA NVCMKSPL 72 

T+ TERL L+ +EL + + ++ SD E YM V +A+ ++ ++ ++ 

3 TLYTERLTLRKMELEDADVLCQYWSDPEVTKYMNITPFTDVSQARDMIQMINDLSLEGQA 62 



Sbjct: 



Query: 



73 GIYAMIEKESQKMIGIIELEIRDEFS--AEFGYII1NKNYNGKGYMTEACSKLMSIGFEHL 130 

+++I KE+ ++IG + D+ + AE GY L +N+ GKG+ +EA KL+ GF L 

63 NRFSIIVKETDEVIGTCGFNMIDQENGRAEIGYDLGRNHWGKGFASEAVQKLIDYGFTSL 122 



Query: 



Sbjct: 



131 DLERIYARFDINNKKSGNVMERIGMKKEGELRHLAKNPKGEWKTRAYYSILKEEY 185 

+L RI A+ + N S ++ + +KEG LR K KG +S+LK EY 

123 NLNRIEAKAffiPENTPSIKLLNSLSFQKEGLLRDYEK-AKGRLIDVYMFSLLKREY 176 
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Based on this analysis, it was predicted that this protein and its epitopes, could he useful antigens for 
vaccines or diagnostics. 

Example 209 

A DNA sequence (GBSx0223) was identified in S.agalactiae <SEQ ID 669> which encodes the amino acid 
5 sequence <SEQ ID 670>. Analysis of this protein sequence reveals the following: 

Possible site: 51 

»> Seems to have no N-terminal signal sequence 

Final Results 

10 bacterial cytoplasm Certainty=0 . 5180 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 {Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

15 >GP:CAA87001 GB:Z46902 unknown [Saccharomyces cerevisiae] 

Identities = 105/224 (46%) , Positives = 148/224 (65%) , Gaps = 3/224 (1%) 

Query: 1 MGDWENFTEGKNPKIDTLNGKTVRIEKINPD-HFEDLFQVYGELSTEDSLTYISFSKFN 59 
+G VE +T P+ L G T R+E ++ + H +LF YE + TY+ F 
20 Sbjct: 11 VGADVEGWTTRAFPEKWLKGNTCRLEPLDRERHGSELFSAYSEAG-QKLWTYLPAGPFT 69 

Query: 60 SKNEFDVFFQTLLKSEDPYYIAIVBNNTGKVIfiTFSLMRIDTKNRWEMGWVvYSSKLKQ 119 

+ E+ F + L +++D AI++ T + +GT L+RID N +E+G+W+S +L++ 
Sbjct: 70 NLEEYLEFIKELNETKTJTVPFAIINKETERAVGTLCLIRIDEANGSLEVGYWFSPELQK 129 

25 

Query: 120 TRIATEAQYLVMKWFEELCIYRRYEWKCDSLNAPSNNSAKRLGFTFEGTFRQAVvYKGRN 179 

T IATEAQ+L+MKYVF++L YRRYEWKCDSLN PS +A RLGF +EGTFRQ WYKGR 
Sbjct: 130 TIIATEAQFLLMKYVFDDLQYRRYEWKCDSLNGPSRRAaMRLGFKYEGTFRQVVVYKGRT 189 

30 Query: 180 RDTNWYS ILDKEWPEKKTRFEKWLDDSNFA VNGYQ IRSLSSIEQ 223 

RDT W+SI+DKEW + FE+WLD +NF NG Q R +++I + 
Sbjct: 190 RDTQWFSIIDKEWLRIRKTFEEWLDKTNFE-NGKQKRGIAAIRE 232 

No corresponding DNA sequence was identified in S.pyogenes. 

35 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 210 

A DNA sequence (GBSx0224) was identified in S.agalactiae <SEQ ID 671> which encodes the amino acid 
sequence <SEQ ID 672>. Analysis of this protein sequence reveals the following: 

40 Possible site: 39 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-12.15 Transmembrane 25 - 41 ( 20 - 49) 

Final Results 

45 bacterial membrane Certainty=0 . 5861 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

50 No corresponding DNA sequence was identified in S.pyogenes. 

A related GBS gene <SEQ ID 8519> and protein <SEQ ID 8520> were also identified. Analysis of this 
protein sequence reveals the following: 
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Lipop: Possible site: -1 Crend: 10 
McG: Discrim Score: -3.31 
GvH: Signal Score (-7.5): -4.44 

Possible site: 39 
>>> Seems to have no N-terminal signal sequence 
ALOM program count: 1 value: -12.15 threshold: 0.0 

INTEGRAL Likelihood =-12.15 Transmembrane 25 - 41 ( 20 - 49) 
PERIPHERAL Likelihood = 11.94 59 
modified ALOM score: 2.93 

*** Reasoning Step: 3 



Final Results 

bacterial membrane Certainty=0. 5861 (Affirmative) < suco 

15 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

SEQ ID 672 (GBS43) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 5 (lane 4; MW 34kDa). It was also expressed in E.coli as a GST-fusion product. 
SDS-PAGE analysis of total cell extract is shown in Figure 13 (lane 9; MW 58kDa) and in Figure 15 (lane 
20 4; MW 59kDa). 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 211 

A DNA sequence (GBSx0225) was identified in S.agalactiae <SEQ ID 673> which encodes the amino acid 
25 sequence <SEQ ID 674>. Analysis of this protein sequence reveals the following: 

Possible site: 32 

»> May be a lipoprotein 

Final Results 

30 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9519> which encodes amino acid sequence <SEQ ID 9520> 
35 was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

40 Example 212 

A DNA sequence (GBSx0226) was identified in S.agalactiae <SEQ ID 675> which encodes the amino acid 
sequence <SEQ ID 676>. Analysis of this protein sequence reveals the following: 

Possible site: 44 

»> Seems to have no N-terminal signal sequence 
45 INTEGRAL Likelihood = -1.54 Transmembrane 165 - 181 ( 164 - 181) 

INTEGRAL Likelihood = -0.85 Transmembrane 67 - 83 ( 67 - 84) 

Final Results 

bacterial membrane Certainty=0. 1617 (Affirmative) < suco 

50 bacterial outside Certainty=0 .0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAA82211 GB:Z28353 similar to a B.subtilis gene (GB: 
BACHEMEHY_5) [Clostridium pasteurianum] 
Identities = 40/185 (21%) , Positives = 87/185 (46%) , Gaps = 6/185 (3%) 

Query: 18 MPKGKQKVILSAIELFASQGFHGTSTAQLAKNAEVSQATIYKYFETKDKLLVFILELIVQ 77 

M K K + SAI++F++ G++G + ++A NA V++ T+Y +F++K+++ +I+E V 
Sbjct: 1 MNKTKDNIFYSAIKVFSNNGYNGATMDEIASNAGVAKGTLYYHFKSKEEIFKYIIEEGVN 60 

Query: 78 TIGRPFFTELSTFSTKEELIHFFVQDRFKFIEKNNDLIKILMQELLINSETSTIFTKLIN 137 

+ TE+ ++IKND K++ +L ++ 

Sbjct: 61 LMKNEIDEATDKEKTALEKLKAVCRVQLNLIYKNRDFFKVIASQLWGKELRQLELRDIMR 120 

15 Query: 138 STDPNITKI FNCLSEGNSL NKMEILRAVIGQFITFFIQLY-ILNIKPENLEEELKQI 193 

+ +1 + E S+ N + + A +G + + LY ++N + +N+ ++ + 

Sbjct: 121 NYWHIEEFVKDAMEAGSIKKGNSLFVAYAFLGTLCS - -VSLYEVINAENDNIMNTIENL 178 

Query: 194 EKQIL 198 

20 il 

Sbjct: 179 MNYIL 183 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
25 vaccines or diagnostics. 

Example 213 

A DNA sequence (GBSx0227) was identified in S. agalactiae <SEQ ID 677> which encodes the amino acid 
sequence <SEQ ID 678>. Analysis of this protein sequence reveals the following: 

Possible site: 24 
30 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2389 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

35 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
40 vaccines or diagnostics. 

Example 214 

A DNA sequence (GBSx0228) was identified in S.agalactiae <SEQ ID 679> which encodes the amino acid 
sequence <SEQ ID 680>. Analysis of this protein sequence reveals the following: 

Possible site: 43 
45 >>> Seems to have no N-terminal signal sequence 

Transmembrane 341 - 357 ( 333 - 361) 

Transmembrane 253 - 269 ( 238 - 277) 

Transmembrane 172 - 188 ( 166 - 196) 

Transmembrane 225 - 241 ( 215 - 251) 

50 INTEGRAL Likelihood = -7.01 Transmembrane 21 - 37 ( 18 - 42) 

Transmembrane 285 - 301 ( 283 - 301) 



INTEGRAL 


Likelihood 


= -13. 


.32 


INTEGRAL 


Likelihood 


=-10. 


.93 


INTEGRAL 


Likelihood 


=-10. 


.77 


INTEGRAL 


Likelihood 


= -8. 


.01 


INTEGRAL 


Likelihood 


= -7. 


.01 


INTEGRAL 


Likelihood 


= -2. 


,66 



Final Results 
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bacterial membrane Certainty=0 . 6328 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB42664 GB:AL049819 putative integral membrane protein 
[Streptomyces coelicolor A3 (2) ] 
Identities = 60/156 (38%) , Positives = 101/156 (64%) , Gaps = 1/156 (0%) 

Query: 176 LMGFMVFFFVFLISGMALLKERTSGTLDRLLATPVKRSD1VFGYMLSYGILAIIQTIVIV 235 

L+G +FL++ +A L+ERTSGTL+RLLA P+ + D++ GY L++G LAI+Q+ + 

Sbjct: 77 LLGIFPLITMFLVTSIATLRERTSGTLERLLAMPLGKGDLIAGYALAFGALAIVQSALAT 136 

Query: 236 LSTIWLLDIQWGSIFSVIIvNFILALVALSLGILMSTLAKSEFQMMQFIPLIIMPQLFF 295 

+W L + V GS + +++V + AL+ +LG+ +S A SEFQ +QF+P +1 PQL 
Sbjct: 137 GLAVWFLGLDVTGSPWLLLLVALLDALLGTALGLFVSAFAASEFQAVQFMPAVIFPQLLL 196 

Query: 296 SGI I - PLENMASWAQTVGKILPLSYSGDALTKI IMY 330 

G+ P +NM + V +LP+SY+ D + +++ + 
Sbjct: 197 CGLFTPRDNMHPALEAVSDVLPMSYAVDGMNEVLRH 232 

There is also homologty to a DNA sequence which was identified in S.pyogenes <SEQ ID 68 1> which 

encodes the amino acid sequence <SEQ ID 682>. Analysis of this protein sequence reveals the following: 

Possible site: 39 
>>> Seems to have no N- terminal signal sequence 



INTEGRAL 


Likelihood 


=-11 


.41 


Transmembrane 


263 


- 279 


( 246 


- 284) 


INTEGRAL 


Likelihood 


= -7 


,70 


Transmembrane 


231 


- 247 


( 224 


- 258) 


INTEGRAL 


Likelihood 


= -4. 


.99 


Transmembrane 


20 


- 36 


( 18 


- 39) 


INTEGRAL 


Likelihood 


= -3. 


.72 


Transmembrane 


349 


- 365 


( 345 


- 368) 


INTEGRAL 


Likelihood 


= -3. 


.45 


Transmembrane 


187 


- 203 


( 182 


- 204) 



Final Results 

bacterial membrane Certainty=0 . 5564 (Affirmative) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 
bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:CAB12662 GB:Z99108 similar to ABC transporter (ATP-binding 
protein) [Bacillus subtilis] 
Identities = 92/369 (24%) , Positives = 180/369 (47%) , Gaps = 25/369 (6%) 

IKRKKTSYVTFFLMPILTTLLALSLSFSNNNQAKIGILDKDNSQISKQFIAQLKQNKKYD 71 
I +K +Y+ F P+L T + S+ N+++ ++ I+D+D++ +S+ +1 QLK + 
IFKKPQNYLIMFAAPLLLTFVFGSMLSGNDDKVRLAIVDQDDTILSQHYIRQLKAHDDMY 74 

IFTKIKKEHIDHYLQDKSLEAVLTIDKGFSDKVLQGKSQKLNIRSIANSEITEWVKAQTN 131 
+F + + L+ K + ++ I + F ++ +GK +L R VK 

VFENMSESKASEKLKQKKIAGI I VI SRSFQTQLEKGKHPELI FRHGPELSEAPMVKQYAE 134 

YLLENYNI IGDVALGNEDTFNR ILQKNQQLNYDVKQVTLTDRSRSKAVSST 182 

L NI A T +K++ + V + TL+D+ S T 



GF ++ ++ + IL + + ++ RL+ +++SR Y+LS+ +G++ 



F I ++LS +F I++ P ++++++ LF L +G GL+I A + +Q NL 



IVMPTSMLAGCLWPLSITPSYMQAIGKLLPQNWVLSAIA-IFQSGGTLSQAWPYLLALMG 354 

V+ T M++G WP+ I P +MQ+I + LPQ W +S + I +G ++ +L + G 

FVIATCMVSGMYWPIDIEPKFMQSIAEFLPQKWAMSGLTEIIANGARVTD ILGICG 366 



Query: 


12 


Sb j ct : 


15 


Query: 


72 


Sb j ct : 


75 


Query: 


132 


Sb j ct : 


135 


Query: 


183 


Sb j ct : 


195 


Query: 


236 


Sb j ct : 


255 


Query: 


296 


Sb j ct : 


311 
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Query: 355 TALALISFS 363 

LA + + 
Sbjct: 367 ILLAFAAIT 375 

5 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 92/375 (24%) , Positives = 164/375 (43%) , Gaps = 66/375 (17%) 

Query: 11 IKELF RDKRTLAMMFIAPILIMFLMNVMFSANSNTKVKIGTINVNTK^ 66 

10 IK LF R K + FL PIL L+ + S ++N + KIG ++ + +S 

Sbjct: 5 IKTLFVKIKRKKTSYVTFFLMPILTT-LLMiSLSFSNMNQAKIGILDKDNSQISK 58 

Query: 67 HIQTOSFKFNSSAKKALKSNKIDALISEDNKSYTWYAOTDSSKTTIjT-RQAFKTAVOTM 125 
+F + LK NK + ++ K + Y S + LT + F V 

15 Sbjct: 59 QFIAQ LKQNKKYDIFTKIKKEHIDHYLQDKSLEAVLTIDKGFSDKVLQG 107 

Query: 126 NSKELISQVKILANKNPKLAQSLQTRSKYIKEKYNY GNKNT GF 168 

S++L I + N ++ + ++++ Y+ E YN GN++T + 

Sbjct: 108 KSQKL NIRSIANSEITEWVKAQTNYLLENYNIIGDVALGNEDTFNRILQKNQQLNY 163 

20 

Query: 169 FAKMIPIL MGFMVFFFVFLISGM--ALLKERTSGTLDRLLATPVKRSD 214 

K + + GF++ + S + +L +++S RL+ + + R 

Sbjct: 164 DVKQVTLTDRSRSKAVSSTTTGFLLILMLGSTSVIYSGILADKSSQLYHRLMLSNLSR-- 221 

25 Query: 215 IVFGYMLSY---GILAIIQTIVIVLSTIWLLDIQWGSIFSVIIVNFILALVALSLGILM 271 

F YMLSY G +A IVI+LS + + +1 ++I+ F+ +L+A+ G+L+ 

Sbjct: 222 --FRYMLSYVCVGFVAFTIQIVIMLSLLKVFNISFFVPTSLLLIIFFLFSLLAIGFGLLI 279 

Query: 272 STLAKSEFQMMQFI PLI IMPQLFFSGI I - PLENMASWAQTVGKILPLSYSGDALTKI IMY 330 
30 + ++ Q Q LI+MP +G + PL S+ Q +GK+LP ++ A+ I 

Sbjct: 280 GAITQNSQQSSQLANLIVMPTSMIjAGCLWPLSITPSYMQAIGKLLPQNWVLSAIA-IFQS 338 

Query: 331 GQGLPNVSSNLLVLL 345 
G L LL L+ 

35 Sbjct: 339 GGTLSQAWPYLLALM 353 

A further related DNA sequence was identified in S.pyogenes <SEQ ID 9081> which encodes the amino 
acid sequence <SEQ ID 9082>. Analysis of this protein sequence reveals the following: 

Possible site: 38 

40 >>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-12.52 Transmembrane 21 - 37 ( 17 - 43) 

INTEGRAL Likelihood =-10.30 Transmembrane 351 - 367 ( 346 - 371) 

INTEGRAL Likelihood = -5.36 Transmembrane 262 - 278 ( 260 - 285) 

INTEGRAL Likelihood = -2.60 Transmembrane 288 - 304 ( 288 - 305) 

45 INTEGRAL Likelihood = -1.81 Transmembrane 229 - 245 ( 229 - 246) 

Final Results 

bacterial membrane Certainty=0 . 6010 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

50 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS sequences follows: 

Score = 62.5 bits (149), Expect = 9e-12 

Identities = 72/382 (18%) , Positives = 166/382 (42%) , Gaps = 32/382 (8%) 

Query: 1 MVLFHLIKKESLQIFRNRTALLMMVIFPILMIVILSFAFKSSFNTATTVPKLTIRYQLEG 60 

M + + +K ++FR++ L MM + PIL++ +++ F ++ NT + + + ++ 
Sbjct: 1 MRIIAITEKVIKELFRDKRTLAMMFIAPILIMFLMNVMFSANSNTKVKIGTINVNTKWS 60 

60 Query: 61 EKTDYQKNFLAFLKVIMQKLHLETKPSNSLEKDRQRVSEGALTAVLEVKKNQTIKVITNN 120 

L+ H++ + ++ + + A++ + N++ V N 

Sbjct: 61 N LDNIKHIQVRSFKFNSSAKKALKSNKIDALIS - EDNKSYTVFYAN 105 



55 



Query: 121 INQQNADLINMLVKNYVDNAKTYDS IAALY PQQLNHIRKRSVDYVKVSSIQTSK 174 
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+ L K V+ + + 1+ + P+ ++ RS Y+K + + 
Sbjct: 106 TDSSKTTLTRQAFKTAVNTMNSKELISQVKILANKNPKLAQSLQTRS - KYIKE KYNY 161 

Query: 175 GMTSADYYA ISMFTMITFYSMMSAMNLVLSDRQQRITMRIHLTGVSPSFLVFGKLI 230 

5 G + ++A I M M+ F+ + + +L +R +R+ TVS +VFG ++ 

Sbjct: 162 GNKOTGFFAKMIPILMGFMVFFEVFLISGMALLKERTSGTLDRLIATPVKRSDIVFGYML 221 

Query: 231 GAMIATWQLSLLYIFTRFVLRVNWGTNEWMLIGITASLVYLSVAIGIGLGISIKNEAFL 290 
+ +Q ++ + T ++L + + + +1 + L +++++GI + K+E + 
10 Sbjct: 222 SYGIIiAIIQTIVIVLSTIWLLDIQWGSIFSVIIVNFILALVALSLGILMSTriAKSEFQM 281 

Query: 291 TVASNTIIPIFAFLGGSYVPLTTLHSSIINQLSNISPIKWVNDSLFYLIFGGQYNP-IPV 349 

II F G +PL + +S + I P+ + D+L +1 GQ P + 
Sbjct: 282 MQFIPLIIMPQLFFSG-IIPLENM-ASWAQTOGKILPLSYSGDALTKIIMYGQGLPNVSS 339 

15 

Query: 350 TLIVNISIGTIFIILALIGMRK 371 

L+V + 11+ G+++ 
Sbjct: 340 NLLVLLLFLIILTIANIFGLKR 361 

20 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 215 

A DNA sequence (GBSx0229) was identified in S.agalactiae <SEQ ID 683> which encodes the amino acid 
sequence <SEQ ID 684>. This protein is predicted to be CG1718 gene product (b0794). Analysis of this 
25 protein sequence reveals the following: 

Possible site: 61 

»> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -1.17 Transmembrane 118 - 134 ( 117 - 134) 

30 Final Results 

bacterial membrane Certainty=0 . 1468 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

35 A related GBS nucleic acid sequence <SEQ ID 8521> which encodes amino acid sequence <SEQ ID 8522> 
was also identified. Analysis of this protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 8 
McG: Discrim Score: -10.96 
GvH: Signal Score (-7.5): -4.84 
40 Possible site: 15 

»> Seems to have no N-terminal signal sequence 

ALOM program count: 1 value: -1.17 threshold: 0.0 

INTEGRAL Likelihood = -1.17 Transmembrane 142 - 158 ( 141 - 158) 
PERIPHERAL Likelihood = 4.98 197 
45 modified ALOM score: 0.73 

*** Reasoning Step: 3 

Final Results 

50 bacterial membrane Certainty=0. 14 6 8 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0.0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

55 >GP:AAF50837 GB:AE003568 CG1718 gene product [Drosophila melanogaster] 

Identities = 80/204 (39%) , Positives = 123/204 (60%) , Gaps = 3/204 (1%) 



Query: 7 EIIGLIGPSGAGKSTLIKTMLGMEKADKGTALV- -LDTQMPDRNILNQIGYMAQSDALYE 64 
E GL+G +GAGK+T KMGE+ GAVL+ +1 IGY Q DAL + 
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Sbjct: 1394 ECFGLLGVNGftGKTTTFKMMTGDERISSGAAYVQGLSLESNMNSIYKMIGYCPQFDALLD 1453 

Query: 65 SLTGLENLLFFGKMKGIQKTELKQQITHISKWDLENQLDKFVSGYSGGMKRRLSLAIAL 124 

LTG ELF ++G+Q++ ++Q ++K +DK YSGG KR+LS AIA+ 

Sbjct: 1454 DLTGREVLRIFCMLRGVQESRIRQLSEDLAKSFGFMKHIDKQTHAYSGGNKRKLSTAIAV 1513 

Query: 125 LGNPTVLILDEPTVGIDPSLRRKIWQELINIKDEGHSIFITTHVMDEAE-LTSKVALLLR 183 

+G+P+V+ LDEPT G+DP+ RR++W + I+D G SI +T+H M+E E L +++A+++ 
Sbjct: 1514 IGSPSVIYLDEPTTGmPAARRQLWNWCRIRDSGKSIVLTSHSMEECEALCTRLAlMVN 1573 

Query: 184 GNI IAFDTPLHLKKQFNVSTIEEV 207 

G + HLK +F+ I ++ 

Sbjct: 1574 GEFKCIGSTQHLKNKFSKGLILKI 1597 
Identities = 73/216 (33%) , Positives = 128/216 (58%) , Gaps = 9/216 (4%) 

Query: 1 MEVFKGEIIGLIGPSGAGKSTLIKTMLGMEKADKGTALV--LDTQMPDRNILNQIGYMAQ 58 

M +F+ EI L+G +GAGK+T I + GM GTA++ D + +G Q 

Sbjct: 536 MNMFEDEITVLLGHNGAGKTTTISMLTGMFPPTSGTAIINGSDIRTNIEGARMSLGICPQ 595 

20 Query: 59 SDALYESLTGLENLLFFGKMKGIQKTELKQQITHISKWDLENQLDKFVSGYSGGMKRRL 118 

+ L++ ++ ++ FF +MKG++ ++Q++ K+++LE++ + S SGGMKR+L 
Sbjct: 596 HNVX.FDEMSVSNHIRFFSRMKGLRGKAVEQEVAKYLKMIELEDKANVASSKLSGGMKRKL 655 

Query: 119 SLAIALLGNPTVLILDEPTVGIDPSLRRKIWQELINIKDEGHSIFITTHVMDEAE-LTSK 177 
25 S+ AL G+ V++ DEP+ G+DPS RR++W +L+ + G ++ +TTH MDEA+ L + 

Sbjct: 656 SVCCALCGDTKWLCDEPSSGMDPSARRQLW-DLLQQEKVGRTLLLTTHFMDEADVLGDR 714 

Query: 178 VALLLRGNI IAFDTPLHLKKQFN VSTIEEVF 208 

+A++ G + T LKKQ+ VS ++ +F 

30 Sbjct: 715 IAIMCDGELKCQGTSFFLKKQYGSGYRLVSGVQNLF 750 

A related DNA sequence was identified in S.pyogenes <SEQ ID 685> which encodes the amino acid 
sequence <SEQ ID 686>. Analysis of this protein sequence reveals the following: 

Possible site: 59 
35 »> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.43 Transmembrane 49 - 65 ( 49 - 65) 

Final Results 

bacterial membrane Certainty=0 . 1171 (Affirmative) < suco 

40 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:CAB12660 GB:Z99108 similar to ABC transporter (ATP-binding 
45 , protein) [Bacillus subtilis] 

Identities = 151/316 (47%) , Positives = 202/316 (63%) , Gaps = 18/316 (5%) 

Query: 4 VQLTNWKSYKNGKKA- VNDVSLSIEAGNIYGLLGPNGAGKSTLINLILGLIPLSSGKIT 62 
+Q N+ K+Y GKK V +S S++ G +GLLGPNGAGKST I++I GL+P SG IT 
50 Sbjct: 2 LQAENIKKAY--GKKTIVKGISFSLKKGESFGLLGPNGAGKSTTISMISGLVPHDSGNIT 59 

Query: 63 VLGQS-QKTIRKISSQIGYVPQDIAvYPDLTAYENVELFGSLYGLKGAQLKKQVLKSLEF 121 

V G K K +IG VPQ+IA+YP LTA+EN+ +G +YGL + KK+ + LE+ 
Sbjct: 60 VGGYVIGKETAKAKQKIGIVPQEIALYPTLTAHENLMFWGKMYGLTHDEAKKRAREVLEY 119 

55 

Query: 122 VGLHSQAKQFPSQFSGGMKRRLNIACALVHSPKLIIFDEPTVGIDPQSRNHILESIRLLN 181 

VGL +AK FSGGMKRR+NI AL+H P+L+I DEPTVGIDPQSRNHILE+++ LN 

Sbjct: 120 VGLTERAKDKIETFSGGMKRRINIGAALMHKPELLIMDEPWGIDPQSRNHILETVKQLN 179 

60 Query: 182 KEGATVIYTTHYMEEVEALCDYI FIMDHGQVIEEGPKFELEKRYVANLANQI I VTLTDSR 241 

+ G TVIYT+HYMEEVE LCD I I+D G++I G K +L R + Q+ V+ + 
Sbjct: 180 ETGMTVI YTSHYMEEVEFLCDRIGI IDQGEMIAIGTKTDLCSRLGGDTI IQLTVSGINEA 239 



65 



Query: 242 HL ELADKPDWSLIEDGEKLMLKIDNSD MTSWHQLTQANITFSEIRHNHL 291 

L LA D ++ E L LKID S +TS++ + T +1 ++ 
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Sbjct: 240 FLVAIRSLAHVNDVTVHE LELKIDISAAHHEKWTSLLAEATAHHINLLSLQVQEP 295 

Query: 292 NLEE I FLHLTGKKLRD 307 
NLE +FL+LTG+ LRD 
5 Sbjct: 296 NLERLFLNLTGRTLRD 311 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 81/211 (38%), Positives = 125/211 (58%), Gaps = 2/211 (0%) 

10 Query: 1 MEVFKGEIIGLIGPSGAGKSTLIKTMLGMEKADKGTALVL-DTQMPDRNILNQIGYMAQS 59 

+ + G I GL+GP+GAGKSTLI +LG+ G VL +Q R I +QIGY+ Q 

Sbjct: 25 LSIEAGNIYGLLGPNGAGKSTLINLILGLIPLSSGKITVLGQSQKTIRKISSQIGYVPQD 84 

Query: 60 DALYESLTGLENLLFFGKMKGIQKTELKQQITHISKWDLENQLDKFVSGYSGGMKRRLS 119 
15 A+Y LT EN+ FG + G++ +LK+Q+ + V L +Q +F S +SGGMKRRL+ 

Sbjct: 85 IAVYPDLTAYENVELFGSLYGLKGAQLKKQVLKSLEFVGLHSQAKQFPSQFSGGMKRRLN 144 

Query: 120 LAIALLGNPTVLILDEPTVGIDPSLRRKIWQELINIKDEGHSIFITTHVMDEAE-LTSKV 178 
+A AL+ +P ++I DEPTVGIDP R I + + + EG ++ TTH M+E EL + 
20 Sbjct: 145 IACALVHSPKLIIFDEPTVGIDPQSRNHILESIRLLNKEGATVIYTTHYMEEVEALCDYI 204 

Query: 179 ALLLRGNI IAFDTPLHLKKQFNVSTIEEVFL 209 

++ G +1 L+K++ + ++ + 

Sbjct: 205 FIMDHGQVIEEGPKFELEKRYVANLANQIIV 235 

25 

SEQ ID 8522 (GBS391) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 74 (lane 7; MW 30kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 83 (lane 4; MW 55kDa). 

GBS391-GST was purified as shown in Figure 217, lane 3. 

30 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 216 

A DNA sequence (GBSx0230) was identified in S.agalactiae <SEQ ID 687> which encodes the amino acid 
sequence <SEQ ID 688>. Analysis of this protein sequence reveals the following: 

35 Possible site: 13 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 6732 (Affirmative) < suco 

40 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

45 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 217 

A repeated DNA sequence (GBSx0231) was identified in S.agalactiae <SEQ ID 689> which encodes the 
amino acid sequence <SEQ ID 690>. This protein is predicted to be ISL2 protein. Analysis of this protein 
50 sequence reveals the following: 
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Possible site: 58 

>>> Seems to have an uncleavable N-term signal seq 



Final Results 

5 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

10 >GP:CAC18596 GB:AJ278419 IS1381 transposase [Streptococcus pneumoniae] 

Identities = 111/129 (86%) , Positives = 117/129 (90%) 

Query: 1 MKAQAIVTSQGRIVSLDIAVNYCHDMKLFKMSRRNIGQAAKIIADSGYQGIMKMYSQAQT 60 
MK QAIVTSQGRIVSLDI VNYCHDMKLFKMSRRNIGQA KILADSGYQG+MK+Y QAQT 
15 Sbjct: 1 MKTQAIWSQGRIVSLDITVNYCHDMKLFKMSRRNIGQAGKILADSGYQGLMKIYPQAQT 60 

Query: 61 PRKSSKLKPLTLEDKTYNHTLSKERIKVENIFAKVKTFKIFSTTYRNRRKRFGLRMNLIA 120 

RKSSKLKPLT+EDK NH LSKER KVENIFAKVKTFK+FSTTYR+ RKRFGLRMNL A 
Sbjct: 61 SRKSSKLKPLTVEDKACNHALSKERSKVENIFAKVKTFKMFSTTYRSHRKRFGLRMNLSA 120 

20 

Query: 121 GMINRELGF 129 

G+IN ELGF 
Sbjct: 121 GIINHELGF 129 

25 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 218 

A repeated DNA sequence (GBSx0232) was identified in S.agalactiae <SEQ ID 691> which encodes the 
30 amino acid sequence <SEQ ID 692>. This protein is predicted to be ISL2 protein. Analysis of this protein 
sequence reveals the following: 

Possible site: 41 

»> Seems to have no N-terminal signal sequence 

35 Final Results 

bacterial cytoplasm Certainty=0 . 3996 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

40 The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAC18595 GB:AJ278419 IS1381 transposase [Streptococcus pneumoniae] 
Identities = 110/125 (88%) , Positives = 119/125 (95%) 

Query: 1 MNYFASKQLTDWFKRLVGVQRTTFEEMLAVliKTAYQRKHAKGGRTPKLSLEDLLMATLQ 60 
45 MNYEASKQLTD RFKRLVGVQRTTFEEMLAVLKTAYQ KHAKGGR PKLSLEDLLMATLQ 

Sbjct: 1 ^WYFASKQLTDARFKRLVGVQRTTFEEMLAVI)KTAYQIJKHAKGGRKPKLSLEDLLMATLQ 60 

Query: 61 YMREYRTYEQIAADFGIHESNLIRRSQWVESTLIQSGFTISKTHLSAEDTVIVDATEVKI 120 
Y+REYRTYE+IAADFG+HESNL+RRSQWVE TL+QSG TIS+T LS +EDTV+ +DATEVKI 
50 Sbjct: 61 YWEYRTYEEIAADFGVHESNLLRRSQWVEOTLVQSGVTISRTPLSSEDTVMIDATEVKI 120 

Query: 121 NRPKK 125 

NRPKK 
Sbjct: 121 NRPKK 125 



55 



No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 219 

A DNA sequence (GBSx0233) was identified in S.agalactiae <SEQ ID 693> which encodes the amino acid 
5 sequence <SEQ ID 694>. Analysis of this protein sequence reveals the following: 

Possible site: 57 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-10.40 Transmembrane 130 - 146 ( 123 - 156) 

INTEGRAL Likelihood = -7.86 Transmembrane 169 - 185 ( 167 - 191) 

10 INTEGRAL Likelihood = -6.90 Transmembrane 100 - 116 ( 95 - 118) 

INTEGRAL Likelihood = -5.52 Transmembrane 199 - 215 ( 189 - 216) 

Final Results 

bacterial membrane Certainty=0 . 5161 (Affirmative) < suco 

15 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB04126 GB:AP001508 unknown conserved protein in others 
20 [Bacillus halodurans] 

Identities = 47/207 (22%) , Positives = 95/207 (45%) , Gaps = 14/207 (6%) 

Query: 7 LQKENTLLEGRIDNSNNQTYTDMIVYLRGA-SISPYHQELIRNDIVNMLLEAQERQASLV 65 
L K+N + N + Y D+++Y+R ASS E + ++++ LLEAQ + S 

25 Sbjct: 6 LIKDNNEKRKLLTEENLKVYEDLLLYIRLRHSKSEQETEELLTELLDHLLEAQAKGKSAK 65 



30 



Query: 66 SVFGEDRHDFINQVIKSTPKISKKEE-TLQRWDLAILLLTIQMIIFLGGYLITEALQQSV 124 

+VFG++ + PK+ KE L + L++ T+ ++F G Y + V 

Sbjct: 66 AVFGDNPKQYADEIIGEIPKMVTKERFGLFAYGLSMFPATV--LVFSGIYRMLRYYVFQV 123 

Query: 125 PDLIPITLLDVLFAIFISIIAVKIADTIIYATYNFDK SKEKKYFFRYIFLILSLII 180 

+ + + A+ +1 ++ IA ++ + + + K F +1 + +1 

Sbjct: 124 GEAVSEVYVGT- -ALITTIASIVIAWMFVFVVFQYFRWSCFRTINKVFEFFILWLGGMIP 181 

35 Query: 181 AYILIGKYYHLP FINIPLWIYLI 203 

+ Y P I IP+++Y + 

Sbjct: 182 FALFFALLYFTPNVGRMIEIPVYLYFV 208 

No corresponding DNA sequence was identified in S.pyogenes. 

40 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 220 

A DNA sequence (GBSx0234) was identified in S.agalactiae <SEQ ID 695> which encodes the amino acid 
sequence <SEQ ID 696>. This protein is predicted to be minor extracellular protease epr precursor (epr). 
45 Analysis of this protein sequence reveals the following: 

Possible site: 31 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-10.72 Transmembrane 10 - 26 ( 5 - 33) 

50 Final Results 

bacterial membrane Certainty=0. 5288 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco • 
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A related GBS nucleic acid sequence <SEQ ID 8523> which encodes amino acid sequence <SEQ ID 8524> 
was also identified. Analysis of this protein sequence reveals the following: 

Lipop Possible site: -1 Crenel: 8 
McG: Discrim Score: 12.11 
5 GvH: Signal Score (-7.5): -4.02 

Possible site: 29 
>>> Seems to have an uncleavable N-term signal seq 
ALOM program count: 1 value: -10.72 threshold: 0.0 

INTEGRAL Likelihood =-10.72 Transmembrane 8 - 24 ( 5 - 33) 
10 PERIPHERAL Likelihood = 13.74 219 

modified ALOM score: 2.64 

*** Reasoning Step: 3 

15 Final Results 

bacterial membrane Certainty=0. 5288 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

20 !GB:Z99123 extracellular serine protease [Bacillus s... 

>GP:CAB15866 GB:Z99123 extracellular serine protease [Bacillus subtilis] 
Identities = 44/150 (29%), Positives = 80/150 (53%), Gaps = 14/150 (9%) 

25 Query: 37 QMDTVESSVNHVSDSQLTEaQDMLDKFEKKPSEKXLKDVELALNKLSNSSKKEALQKRFK 96 
++D V+S N + +A+D + K EK +++ + + A+NKL N + K+ LQKR 

Sbjct: 428 RLDKVQSYRN VKDAKDKVAKftEKSKTQQTVDTAQTAINKLPNGTDKKNLQKRLD 481 

Query: 97 KAKDKYLKDEADKKATKDATDLVEILEQM^ 156 
30 + K +Y+ A+K A D V E++ + +V A++A+ KL K +LQKR++ 

Sbjct: 482 QVK-RYI ASKQAKDKWAKAEKSKKKTDVDSAQSA1GKLPASSEKTSLQKRLNK 533 

Query: 157 VKTQYGLIGNQTPSSSVAETTEQGTANPAS 186 
VK+ Q+ S++ ++T+ A S 

35 Sbjct: 534 VKSTNLKTAQQSVSAAEKKSTDANAAKAQS 563 

Identities = 39/124 (31%) , Positives = 64/124 (51%) , Gaps = 2/124 (1%) 

Query: 35 TTQMDTVESSVNHVSDSQLTEAQDMLDKFEKKPSEKLLKDVEIALNKLSNSSKKEALQKR 94 
+++ +++ +N V + L AQ + EKK ++ + A+N+L K ALQKR 

40 Sbjct: 521 SSEKTSLQKRLNKVKSTNLKTAQQSVSAAEKKSTDANAAKAQSAVNQLQAGKDKTALQKR 580 

Query: 95 FKKAKDKYLKDEADKKATKDATDLVEILEQAPSEENVLKAE^WKLTVKESKEALQKRI 154 

KKK EAKTA V+E+ ++++ A++AVN+L K LQKR+ 

Sbjct: 581 LDKVKKKVAAAEAKKVETAKAK- - VKKAE KDKTKKS KTSAQSA VNQLKASNEKTKLQKRL 638 



45 



55 



60 



Query: 155 DTVK 158 
+ VK 

Sbjct: 639 NAVK 642 



50 A related DNA sequence was identified in S.pyogenes <SEQ ID 697> which encodes the amino acid 
sequence <SEQ ID 698>. Analysis of this protein sequence reveals the following: 



Possible site: 41 
>>> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -4.99 Transmembrane 24 - 40 ( 23 - 43) 



Final Results 

bacterial membrane Certainty=0 .2996 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:CAB15866 GB:Z99123 extracellular serine protease [Bacillus subtilis] 
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Identities = 43/130 (33%) , Positives = 71/130 (54%) , Gaps = 8/130 (6%) 

Query: 41 GSHPQTQDKVA---KHSKSAASLLKKAVKAVNDADRLATAAAIQEAQKAVDKLAESSKKK 97 

G P + +K + + +K ++ LK A ++V+ A++ +T A +AQ AV++L K 
Sbjct: 516 GKLPASSEKTSLQKRLNKVKSTNLKTAQQSVSAAEKKSTDANAAKAQSAVNQLQAGKDKT 575 

Query: 98 TLQEQIiN VAKAKQEQEDAATQAVKAAEETLNQNLKDIAQKAVNDLSNKGKKAALQ 152 

LQ++L+ VA A+ ++ + A VK AE+ + K AQ AVN L +K LQ 

Sbjct: 576 ALQKRLDKVKKKVAAAEAKKVBTAKAKVKKAEKDKTKKSKTSAQSAVNQLKASNEKTKLQ 635 

Query: 153 SRLDAILPAK 162 

RL+A+ P K 
Sbjct: 636 KRLNAVKPKK 645 
Identities = 31/105 (29%) , Positives = 53/105 (49%) , Gaps = 1/105 (0%) 

Query: 54 SKSAASLLKKAVKAVlTOADRIATAAAIQEAQKAvIJKIAESSKKKTLQEQLNVAKAKQEQE 113 

+++ S A +AV A++ I +A++ + +L S K L ++L+ ++ + + 

Sbjct: 380 AQATDSAYAAAEQAVKKAEQTKAQIDINKARELISQLPNSDAKTALHKRLDKVQSYRNVK 439 

Query: 114 DAATQAVKAAEETLNQNLKD IAQKAVNDL SNKGKKAALQSRLDAI 158 

DA + KA E+ Q D AQ A+N L N K LQ RLD + 
Sbjct: 440 DAKDKVAKA-EKYKTQQTVDTAQTAINKLPNGTDKKNLQKRLDQV 483 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 61/233 (26%) , Positives = 115/233 (49%) , Gaps = 13/233 (5%) 

Query: 2 SMKIDKKELL^IASIILLIFASVTFFLFKDHGTTQMDTVESSVNHVSDSQLTEAQDMLD 61 

SM +KE L + S++ + + +F H TQ + S + + S L +A ++ 

Sbjct: 12 SMTKSQKEALYWMLSVLTITLIGGSCIiIFGSHPQTQDKVAKHSKS- -AASLLKKAVKAVN 69 

Query: 62 KFEKKPSEKLLKDVELALNKLSNSSKKEALQKRFKKAKDKYLKDE^KKATKDATDLVEI 121 

++ + +++ + A++KL+ SSKK+ LQ++ AK K +++A AT V+ 

Sbjct: 70 DADRIATAAAIQEAQKAVDKLAESSKKKTLQEQIiNVAKARQEQEDA ATQAVKA 122 

Query: 122 LEQAPSEENvLKAEAAVNKLTVKESKEALQKRIDTVKTQYGLIGNQTPSSSVAETTEQGT 181 

E+ ++ A+ AVN L+ K K ALQ R+D + +1 ++ P S E T+ 

Sbjct: 123 AEETHJQNLKDIAQKAVNDLSNKGKKAALQSRLDAILPAKPII-DEFPRQS-GEITDNSY 180 

Query: 182 ANPASQDTSSYWQIWAPTYE-QPQAI^PVTPGVNNTVP-TPGTGTVPATNG 232 

P D S + + +PT + +++ + VTP ++ P P T + P+ +G 
Sbjct: 181 WTPFPGDVSDTYDNSQSPTLDPSSESSASDVTPQPSHPDPIPPQTSSEPSDSG 233 

SEQ ID 8524 (GBS278) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 52 (lane 6; MW 40kDa). 

The GBS278-His fusion product was purified (Figure 206, lane 10) and used to immunise mice. The 
resulting antiserum was used for FACS (Figure 305), which confirmed that the protein is immunoaccessible 
on GBS bacteria. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 221 

A DNA sequence (GBSx0235) was identified in S.agalactiae <SEQ ID 699> which encodes the amino acid 
sequence <SEQ ID 700>. Analysis of this protein sequence reveals the following: 

Possible site: 53 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1466 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 
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bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

5 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 222 

A DNA sequence (GBSx0236) was identified in S.agalactiae <SEQ ID 701> which encodes the amino acid 
sequence <SEQ ID 702>. This protein is predicted to be N-acetylglucosamine-6-phosphate deacetylase 
10 (nagA). Analysis of this protein sequence reveals the following: 

Possible site: 15 

»> Seems to have no N-terminal signal sequence 

Final Results 

15 bacterial cytoplasm Certainty=0 .4607 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9297> which encodes amino acid sequence <SEQ ID 9298> 
20 was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAG21688 GB:AY007718 N-acetylglucosamine- 6 -phosphate deacetylase 
[Lactococcus lactis subsp. cremoris] 
Identities = 113/178 (63%) , Positives = 135/178 (75%) 

25 

Query: 131 GIYFEGPYFTEEYKGAQNPIYMRNPNLEEFAQWQKAAKGLITKIALAPEREGVEEFVSAI 190 

GI+FEGP+FTEE KGAQNP YMR+ + E WQ+AA G++ KI LAPEREG E+F+ 
Sbjct: 1 GIFFEGPFFTEEKKGAQNPKYMRDAKMWELEDWQEAAHGMLKKIGLAPEREGSEDFIRKA 60 

30 Query: 191 TKO^VTVALGHSNGTYKEAKKAVKAGASVWVHAYNGMRGLTHREPGMVGAvYNLPNTYAE 250 

T+ GV +ALGHSN TYK+A V+AGASVWVH +NGM G+TH+EPGMVGA+ N PNTYAE 
Sbjct: 61 TESGVVIALGHSNATYKQAVAGVQAGASVWVHTFNGMSGMTHQEPGMVGAILNTPNTYAE 120 

Query: 251 LICDGHHVnPVACDILMTQKGHNHVALITDCMAAGGAPDGDYMLGELPVVVSNGTARL 308 
35 LICDGHHV P A +I++ KG +HV LITD MAG PDG YMLGE V V +G A L 

Sbjct: 121 LICDGHHVRPEAAEI VVKMKGADHVVLITDSMRAAGLPDGPYMLGEYEVEVRDGAA.WL 178 

A related DNA sequence was identified in S.pyogenes <SEQ ID 703> which encodes the amino acid 
sequence <SEQ ID 704>. Analysis of this protein sequence reveals the following: 

40 Possible site: 40 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3114 (Affirmative) < suco 

45 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 227/300 (75%) , Positives = 262/300 (86%) 

Query: 9 MTKYIKADRFFYADHVKENGYLEIKDNHFGKWIENISGQEEILDYSGYQIAPGLVDTHIH 68 

MT Y+KAD F+Y V+ GYL + D FG+W E + +I+DY+GYQIAPGLVDTHIH 
Sbjct: 1 MTCYLKADCFYYPTETOPAGYLSLHDGVFGEWTEIVPADAQIIDYTGYQIAPGLVDTHIH 60 



50 
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Query: 69 GFAGADVMDCDSEGILRMSAGLLSTGVTSFLPTTLTSDTKRLEEASKSVAAVAGKEQGAK 128 

G+AGADVMD ++GI +MS GLL+TGVTSFLPTTLTS ++LE+ S ++A+VA + +GAK 
Sbjct: 61 GYAGADVMDNSAQGIHQMSEGLLATGVTSFLPTTLTSTFEQLEKVSGTIASVADQVKGAK 120 

5 Query: 129 IQGIYFEGPYFTEEYKGAQNPIYMRNPNLEEFAQWQKAAKGLITKIALAPEREGVEEFVS 188 

IQGIYFEGPYFTEEYKGAQNP YM+ P LEEF WQKAAKGLI KIALAPER+GV+EFVS 
Sbjct: 121 IQGIYFEGPYFTEEYKGAQNPSYMKTPRLEEFDAWQKAAKGMKKIAIAPERDGVKEFVS 180 

Query: 189 AITKQGVTVALGHSNGTYKEAKKAVKAGASA7WVHAYNGMRGLTHREPGMVGAVYNLPNTY 248 
1 0 A+TKQGVTVALGHSNGTY+EAK+AV+AGASVWVHAYNGMRGLTHREPGMVGAVYNLPNTY 

Sbjct: 181 AOTKQGVTVALGHSNGTYQEAKEAVQAGASVWVH&YNGMRGLra 240 

Query: 249 AELICDGHHVDPVACmiLMTQKGHNHVALITDCMAAGGAPDGDYMLGELPVWSNGTARL 308 
AELICDGHHV P+ACDILM QKGH+HVA+ ITDCM AGG+PDGDY+LGE VW+NGTARL 
15 Sbjct: 241 AELICDGHHVSPIACDILMQQKGHDHVAMITDCMRAGGSPDGDYLLGEFSVWANGTARL 300 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 223 

20 A DNA sequence (GBSx0237) was identified in S.agalactiae <SEQ ID 705> which encodes the amino acid 
sequence <SEQ ID 706>. Analysis of this protein sequence reveals the following: 
Possible site: 25 

>>> Seems to have no N-terminal signal sequence 

25 Final Results 

bacterial cytoplasm Certainty= 0.3709 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

30 A related GBS nucleic acid sequence <SEQ ID 9307> which encodes amino acid sequence <SEQ ID 9308> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 



35 



>GP:CAB16112 GB:Z99124 yyaQ [Bacillus subtilis] 
Identities = 40/110 (36%) , Positives = 62/110 (56%) , Gaps = 12/110 (10%) 

Query: 121 IAKTFEDSVDYPFAKHPQYASYRVSG- -KWYALLFPLKMGKLENVPAQLSED EVEVL 175 

+ + + S DYP+ K+P YAS R + KWY L+ + +P +L D E+++L 

Sbjct: 11 VKEKYGTSPDYPWEKYPNYASLRHTSNKKWYGLIMNV LPEKLGLDGHGEIDIL 63 

40 Query: 176 NIKVNPQDMEILLQKEGIYPSYHMSKKTWVSIVLDNTLSDIEIFKLVSDS 225 

N+K P+ + L E I P YHM K+ W+SIVL+ T + EI+ L+ S 
Sbjct: 64 NLKCPPEISDRLRNGENILPGYHMDKEHWISIVLERTDPEGEIYNLIEQS 113 

A related DNA sequence was identified in S.pyogenes <SEQ ID 707> which encodes the amino acid 
45 sequence <SEQ ID 708>. Analysis of this protein sequence reveals the following: 

Possible site: 22 

>>> Seems to have no N-terminal signal sequence 

Final Results 

50 bacterial cytoplasm Certainty=0. 2541 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

55 Identities = 114/247 (46%), Positives = 169/247 (68%), Gaps = 1/247 (0%) 



Query: 7 MSIESDFFRKKRFIFSSLEEFGFIKSDQEYIYCQTFMDNDFKAIITISLDGKIAGKVIDS 66 
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MS+ +D+F ++ I L +GF KB Y Y + FM+ +F+A +1 G I +VID 
Sbjct: 1 MSIATDYFSRQTPIVEKLMAYGFEKRDNGYFYNERFMEGEFEAQLRIDEAGNIWDRVIDC 60 

Query: 67 ALEEEYLPLRAANYNGSFVGEVRSAYMAILGDISDSCCKDLLFTKDQSNRLAEKIAKTFE 126 
5 LEE+YLPL+ A + G++ G+VR+AY+ +L +S +C + F Q+NRIA+ I K + 

Sbjct: 61 DLEEDYLPLQQAAWQGTYTGQVRAAYLELLERLSVACFEATPFQSMQANRLAKHITKEWS 120 

Query: 127 DSVDYPFAKHPQYASYRVSGKWYALLFPLKMGKLENVPAQLSEDEVEVIiNIKVNPQDMEI 186 
D +DYPF KHP A+YRV GKHYA++F L KL+ +P +L EV+ +KVNP+ 

10 Sbjct: 121 DPMDYPFEKHPDIATYRVGGKWYAMIFSLIADKLDQIPERLVGQTCEVMTVKVNPKAFPQ 180 

Query: 187 LLQKEGIYPSYHMSKKTWVSIVLDNTLSDIEIFKLVSDSRKLVSHNKKSN-SEPEFWIIP 245 

LLQ+EGIYP+YHMSKK W+SI+LD+ ++D +++ LV+ SR+LV+ N SM + P++W+IP 
Sbjct: 181 LLQQEGIYPAYHMSKK^ISIILDDKOTDDKLOTLVTQSRQLVNPNGLSNPNGPDYWVIP 240 

15 

Query: 246 ANPKFYD 252 

AN K+YD 
Sbjct: 241 ANLKYYD 247 

20 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 224 

A DNA sequence (GBSx0238) was identified in S.agalactiae <SEQ ID 709> which encodes the amino acid 
sequence <SEQ ID 710>. This protein is predicted to be transposase for insertion sequence element is905. 
25 Analysis of this protein sequence reveals the following: 
Possible site: 61 

>» Seems to have no N-terrainal signal sequence 

Final Results 

30 bacterial cytoplasm Certainty=0. 1824 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9601> which encodes amino acid sequence <SEQ ID 9602> 
35 was also identified. 

A related GBS nucleic acid sequence <SEQ ID 9595> which encodes amino acid sequence <SEQ ID 9596> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAA25167 GB:L20851 transposase [Lactococcus lactis] 
40 Identities = 325/391 (83%) , Positives = 365/391 (93%) 

Query: 12 MTQFTTELLNFLAQKQDIDEFFRSSLETAMNDLLQVELSAFLGYEPYDKAGYNTGNSRNG 71 

MTQFTTELLNFLAQKQD IDEFFR+ SLETAMNDLLQ ELSAFLGYEPYDK GYN+GNSRNG 
Sbjct: 1 MTQFTTELIJSIFLAQKQDIDEFFRTSLETAMNDLLQAELSAFLGYEPYDKVGYNSGNSRNG 60 

45 

Query: 72 AYTRRFETKYGVVNLLIPRDRNGEFSPALIPSYGRRDNHLEEMVIKLYRTGVTTREISDI 131 

+Y+R+FETKYG V L IPRDRNG FSPAL+P+YGRRD+HLEEMVIKLY+TGVTTREISDI 
Sbjct: 61 SYSRQFETKYGTVQLS I PRDRNGNFS PALLPAYGRRDDHLEEMVI KLYQTGVTTRE I SDI 120 

50 Query: 132 IERMYGHHYSPATVSNISKATQENVASFHERSLEANYTVLYLDGTYLPLRRGTVSKECIH 191 

IERMYGHHYSPAT+SNISKATQENVA+FHERSLEANY+VL+LDGTYLPLRRGTVSKECIH 
Sbjct: 121 IERMYGHHYSPATISNISKATQENVATFHERSLEANYSVLFLDGTYLPLRRGTVSKECIH 180 

Query: 192 IALGVTSYGHKAILGYDIAPNENNASWSDLLERFKGQGVQQVSLWSDGFNGLDQLIQQA 251 
55 IALG+T G KA+LGY+ IAPNENNASWS LL++ + QG+QQVSLW+DGF GL+Q+I QA 

Sbjct: 181 IALGITPEGQKAVLGYEIAPNENISfASWSTLLDKLQNQGIQQVSLvvTDGFKGLEQIISQA 240 



Query: 252 FP^KQQRCLWIGRNIASKVKRADRALILEQFKTIYRAINVEEAKQALDSFINEWKPHY 311 
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+P+AKQQRCL+HI RN+ASKVKRADRA+ILEQFKTIYRA N+E A QAL++FI EWKP Y 
Sbjct: 241 YPIAKQQRCLIHISRNIASKVKRADRAVILEQFKTIYRAENLEMAVQALENFIAEWKPKY 300 

Query: 312 KIWIETLESIElSnjLIFYEFPHQIWGSIYSranjIESLNKEIKRQTKKKVVFPNEESLERYL 371 
5 +KV+E+LE+ +NLL FY+FP+QIW S IYSTNLIESLNKE I KRQTKKKV+ FPNEE+LERYL 

Sbjct: 301 RKVMESLENTDNLLTFYQFPYQIWHSIYSTNLIESIiNKEIKRQTKKKVLFPNEFALERYL 360 

Query: 372 VTLFSDYNFKQGQRIHKGFGQCTDTLESLFD 402 
VTLF DYNFKQ QRIHKGFGQC DTLESLFD 
10 Sbjct: 361 VTLFEDYNFKQSQRIHKGFGQCADTLESLFD 391 

A related DNA sequence was identified in S. pyogenes <SEQ ID 71 1> which encodes the amino acid 

sequence <SEQ ID 712>. Analysis of this protein sequence reveals the following: 

Possible site: 15 
15 >>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3054 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

20 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 111/128 (86%) , Positives = 122/128 (94%) 

25 Query: 12 MTQFTTELmFIAQKQDIDEFFRSSLETAMNDLLQVELSAFLGYEPYDKAGYNTGNSRNG 71 

MTQFTTELLNFIAQKQDIDEFFRSSLE AMNDLLQVELSAFLGYEPY+K GYNTGNSRNG 
Sbjct: 1 MTQFTTELLNFIAQKQDIDEFFRSSLEIAMNDLLQVELSAFLGYEPYEKEGYNTGNSRNG 60 

Query: 72 AYTRRFETKYGVVNLLIPRDRNGEFSPALIPSYGRRDNHLEEMVIKLYRTGVTTREISDI 131 
30 Y+R+ FETKYG+ VNL+ 1 PRDRNGEFSP L+PSY RR++HLEE+VIKLY+TGVTTREISDI 

Sbjct: 61 TYSRQFETKYGLVNLIIPRDRNGEFSPVLLPSYARREDHLEEIVIKLYQTGVTTREISDI 120 

Query: 132 IERMYGHH 139 
I+RMYG H 
35 Sbjct: 121 IKRMYGDH 128 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 225 

40 A DNA sequence (GBSx0239) was identified in S.agalactiae <SEQ ID 713> which encodes the amino acid 
sequence <SEQ ID 714>. Analysis of this protein sequence reveals the following: 

Possible site: 32 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-12.42 Transmembrane 268 - 284 ( 260 - 286) 
45 INTEGRAL Likelihood = -6.32 Transmembrane 232 - 248 ( 231 - 254) 

Final Results 

bacterial membrane Certainty=0. 5967 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

50 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 



55 



>GP:AAD40365 GB:AF036485 hypothetical protein [Plasmid pNZ4000] 
Identities = 69/283 (24%) , Positives = 133/283 (46%) , Gaps = 9/283 (3%) 

Query: 11 INVDDLSLQEERF - LPSELLAYARDENESS - FVRDIEGHLAL VYQLLDTQGHVDDVRHVP 68 

IN ++ + E+++ + +++ Y D +ES+ +V DI L L D +R++ 

Sbjct: 19 INAEERATLEDQYGIDEDIIEYVTDNDESTNYVYDINEDDQLFIFLAPYALDKDALRYIT 78 
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Query: 69 RVIPVTLFLKEDGLFVLANHKNIMLVKKALNRV EKVDSPKHLLLSLVTAFSKQYFDV 125 

+ P+ L+LFNIVAL +VS +L+ + + 

Sbjct: 79 Q--PFGMLLHKGVLFTF-NQSGIPETtfNTALYSALDNPEVKSVDAFILETLFTVWSFIPI 135 

Query: 126 LDTISEERDKLIlTOLRKRPNKSNIARMISn^SGT^ 185 

I+++R+ L L ++ S+L L+ LQ L + N L L 

Sbjct: 136 SRAITm^LDKMLNRKTKNSDLVSLSYLQQTLTFLSSAVQTNLSELDRLPKTHFGVGA 195 

Query: 186 TRNEKMQLQDAI IEARQLSNMCSLNSQVFQELS - S YNNVLSNNLNDNVTTLTI IS IGI S I 244 

+++ +D IE Q+ M + +QV + + N++ +NNLND + LTI S+ +++ 
Sbjct: 196 DQDKIDLFEDVQIEGEQVQRMFEIETQVVDRIDHTMSIANNNLNDTMKFLTIWSLTMAV 255 

Query: 245 IAMVTSFYGMNVKLPFDSVDAVWVLIILITTIITIMLSIVMYI 287 

+++ FYGMNVKLP + W+L + 1+ ++ + + I++ + 
Sbjct: 256 PTI I SGFYGMNVKLPIAGMQYAWMLTLGISVVLIVAMLIMIiKV' 298 

SEQ ID 714 (GBS422) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 172 (lane 7; MW 60kDa). 

GBS422-GST was purified as shown in Figure 219, lane 12. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 226 

A DNA sequence (GBSx0240) was identified in S.agalactiae <SEQ ID 717> which encodes the amino acid 
sequence <SEQ ID 718>. Analysis of this protein sequence reveals the following: 

Possible site: 45 

>>> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm --- Certainty=0 . 0783 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB61731 GB:AL133220 putative oxidoreductase . [Streptomyces 
coelicolor A3 (2) ] 

Identities = 100/306 (32%) , Positives = 152/306 (48%) , Gaps = 3/306 (0%) 

Query: 3 KWYGWSTAKVAPRFIEGTOIAGNGEVVAVSSRTLESAQAFANKYHLPKAYDKLEDMLA 62 

KVR+G+++T +A RF + + EWAV+SRT SA+ FA ++ +P+AY E + 
Sbjct: 8 KVRWGILATGGMAARFTADLVDLPDAEVVAVASRTEASAKTFAERFGIPRAYGGWETLAR 67 

Query: 63 DESIDVIYVATINQDHYKVAKAALIAGKHVLVEKPFTLTYDQANELFALAESCNLFLMEA 122 

DE +DV+YVAT + H A L AG++VL EKPFTL +A EL ALA +FLMEA 
Sbjct: 68 DEDvnWWATPHSAHRTAAGLCLEAGRNVLCEKPFTLNARFAAELVAIARENGVFLMEA 127 

Query: 123 QKSVFI PMTQVIKKLLASGEIGEVISISSTTAYPN- IDHVTWFRELELGGGTVHFMAPYA 181 

P+ + +K+L+A G IGEV S+ + R+ GGG + + Y 

Sbjct: 128 ^lWMYCNPLVRRLKELVADGAIGEVRSLQADFGIIAGPFPAftHRLRDPAQGGGALLDLGVYP 187 

Query: 182 LSYLQYLFDATITHASGTATFPKGQSDSQSKLLLQLSNG VL VDI FLTTRLNLPHEMI IYG 241 

+S+ Q L T + A + D Q+ LL N L I + P+ I G 

Sbjct: 188 VSFAQLLLGEP-TDVAARAVLSEEGVDLQTGAIiLSYGNDALASIHCSITGGTPNSASITG 246 

Query: 242 TEGRLIIPH-FWKTTHAKLVRNDTSARTIQVDMVSDFEKEAYHVSQMILEGQRVSHIMTP 300 

+EGR+ +P+ F+ H L R + + D + H ++ ++ R +P 

Sbjct: 247 SEGRIDVPNGFFFPDHFVLHRTGRDPQEFRADPADGPRESLRHEAEEVMRALRAGETESP 306 

Query: 301 QLTLSG 306 
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Sbjct: 307 LVPLDG 312 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

5 Example 227 

A DNA sequence (GBSx0241) was identified in S.agalactiae <SEQ ID 721 > which encodes the amino acid 
sequence <SEQ ID 722>. This protein is predicted to be valyl-tRNA synthetase (valS). Analysis of this 
protein sequence reveals the following: 

Possible site: 36 
10 >» Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -0.00 Transmembrane 794 - 810 ( 794 - 810) 

Final Results 

bacterial membrane Certainty=0 . 1001 (Affirmative) < suco 

15 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAA57558 GB:L08854 valyl-tRNA synthetase [Lactobacillus casei] 
20 Identities = 543/881 (61%) , Positives = 679/881 (76%) , Gaps = 12/881 (1%) 



25 



30 



35 ALSDIEVIHKD +GAFYH+ Y DGS +E+ATTRPETM GD AVAV+P D RYKD++G 



40 



45 



50 



Query: 


5 


Sb j Ct : 


27 


Query: 


65 


Sb j ct : 


87 


Query: 


125 


Sbjct: 


147 


Query: 


185 


Sb j ct : 


207 


Query: 


245 


Sb j ct : 


267 


Query: 


305 


Sbj ct : 


327 


Query: 


365 


Sbjct: 


387 


Query: 


423 


Sbj ct : 


447 


Query: 


482 


Sbj ct : 


507 


Query: 


542 


Sbj ct : 


567 



LSPKYNPAEVEEGRYQTWLDQDVFKPSGDTEAKPYSIVIPPPNVTGKLHLGHAWDTTLQD 64 
L+PKY+ VEEGRYQ WLD+DVFKPSGD +AKPYSIVIPPPNVTGKLH+GHAWDTTLQD 
IjAPKYDHKAVEEGRYQEWLDEDVFKPSGDKKAKPYSIVIPPPNVTGiCLHMGHAWDTTLQD 8 6 

1 1 IRQKRMC^FDTLWLPGMDHAGIATQAKVEERLREC^ISRYDLGREKFLDKVWEWKDEY 124 
I + IRQKR+ +GFDTLWLPGMDHAGIATQAKVE +LR++GISRYDLGREKF+ KVWEWKDE+ 
IVIRQKRIEGFDTLWLPGMDHAGIATQAKVEAKLRKEGISRYDLGREKFVQKVWEWKDEF 146 



A TI QW KMGLS+DYSRERFTLD+GL++AVR+VFVDLYN+G IYRGE+I+NWDP ART 



+ ILP+ N+ IPI+ D + DPEFGTG VKITPAHDPNDF VG RH+L ++N MNDDGTM 



NE A ++ GMDRFEARKA+VA L+ G L+K++ HSVGHSERTGV VE RLSTQWFVK 



M LA+ AI A Q+ + KV F P RF T++ WMEN+HDWVI SRQLWWGHQI PAWYN 



GE YVG +AP+ + W QD DVLDTWFSSALWPFSTMGWP+T+A D+KRY+PT TLVTGY 



55 DII FWV+RMIFQ L FT ++PF LIHGL+RDE+GRKMSKSLGNGIDPMDVIEKYGAD 



ALRWFL G+ PGQD RFSY++++A+WNFINKIWNISR+++MN L Q + 
60 Sbjct: 567 ALRWFLITGNKPGQDTRFSYKQVEAAWNFINKIWNISRFWMNLGDLDTPQQPD 620 
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Query: 602 NSQVGNVTDRWILHNLNETVGKOTENFDKFEFGVAGHILYNFIWEEFANWYVELTKEVLY 661 

+++D+W+ IiNET+ +V + +FEFG G LYNF W A+WYVE++KEVLY 
Sbjct: 621 - PSTFDLSDKWLFAQLNETI KQVMDLSARFEFGEMGRTLYNFTWNVIADWYVEMSKEVLY 679 

5 Query: 662 SDNEDEKVITRSVLLYTLDQILRLLHPIMPFVTEEIF--GQYAEGSIVLASYPQVNATFE 719 

D+E K R L Y LDQILRLLHP+MPFV +++ + SIV ASYP N FE 
Sbjct: 680 GDDEQAKAAKRWIAYALDQILRLLHPVMPFVHGKLWLALPHTGKSIOTASYPVANTAFE 739 

Query: 720 NQTAHKGVESLKDLIRSVRNSRAEVNVAPSKPITILVKTSDSELESFFKDNSNYIKRFTN 779 
10 N A ++++ LIR VR R E + ILVK +D L+ F+ N ++I RF N 

Sbjct: 740 NADATSAMDAIIAIiIRGVRGIRKEAGAPLKTKVDILVKLTDPALKPIFEQNFDFIDRFVN 799 

Query: 780 PETLE I SSAI ATPEIAMS SVI TGAEI FLPLADLLNVEEELARLEKELAKWQKELDMVGKK 839 
+ + + +A P++A S+VITGA IF+PL +L++++EE A+L K+ K ++E+ + KK 
15 Sbjct: 800 SKAFWGTDVAEPKMAGSAVITGATIFVPIJNELIDLDEEKAKLTKDAKKLEQEIARIDKK 859 

Query: 840 LSNERFVANAKPEWQKEKDKQTDYQTKYDATIARIEEMKK 880 

L+N+ F++ A W +++ K++D++ + +T R+E++++ 
Sbjct: 860 LNNQGFLSKAPEAWAEQRTKRSDFEDQLTSTKQRLEQLQR 900 



20 



25 



30 



A related DNA sequence was identified in S.pyogenes <SEQ ID 723> which encodes the amino acid 
sequence <SEQ ID 724>. Analysis of this protein sequence reveals the following: 

Possible site: 61 

»> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 5062 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 782/878 (89%) , Positives = 818/878 (93%) 

Query: 4 ELSPKYNPAEVEEGRYQTWLDQDVFKPSGDTEAKPYSIVIPPPNVTGKLHLGHAWDTTLQ 63 
35 ELSPKYNPAEVE GRYQ WLD DVFKPSGD +AKPYSIVIPPPNVTGKLHLGHAWDTTLQ 

Sbjct: 3 ELSPKYNPAEVEAGRYQKWLDADVFKPSGDQKAKPYSIVIPPPNVTGKLHLGHAWDTTLQ 62 

Query: 64 DI I IRQKRMQGFDTLWLPGMDHAGIATQAKVEERLREQGI SRYDLGREKFLDKVWEWKDE 123 
DI I IRQKRMQGFDTLWLPGMDHAGIATQAKVEERLREQGI SRYDLGR+KFLDKVWEWKDE 
40 Sbjct: 63 DIIIRQKRMQGFDTLWLPGMDHAGIATQAKVEERLREQGISRYDLGRDKFLDKVWEWKDE 122 

Query: 124 YAATIKSQWGKMGLSVDYSRERFTLDEGLSKAVRKVFVDLYNKGWIYRGEFIINWDPARR 183 

YA TIK QWGKMGLSVDYSRERFTLDEGLSKAVRKVFVDLY KGWIYRGEFIINWDPAAR 
Sbjct: 123 YATTIKEQWGKMGLSVDYSRERFTLDEGLSKAVRKVFVDLYKKGWIYRGEFIINWDPAAR 182 

45 

Query: 184 TALSDIEVIHKDVEGAFYH^mMLEDGSRALEVATTRPETMFGDVAVAVNPEDARYKDLI 243 

TALSDIEVIHKDVEGAFYHMNYMLEDGSRAL+VATTRPETMFGDVAVAVNPED RYKDLI 
Sbjct: 183 TALSDIEVIHKDVEGAFYHMNYMLEDGSRALQVATTRPETMFGDVAVAVNPEDPRYKDLI 242 

50 Query: 244 GQNVILPIINKP1PIVADEHADPEFGTGWKITPAHDPNDFAVGQRHNLPQVNVMNDDGT 303 

G+NVILPI+NK IPIV DEHADPEFGTGWKITPAHDPNDF VGQRHNLPQVNVMNDDGT 
Sbjct: 243 GKWILPIWKLIPIVGDEHADPEFGTGVVKITPAHDPNDFEVGQRHNLPQVIWMNDDGT 302 

Query: 304 MNELADEFNGr©RFFARKAWAKLESLGNLvKIKKTTHSVGHSERTGVVvEPRIjSTQWFV 363 
55 MNEIA +F GMDRFEAR+A VAKLE LG LV I+K HSVGHSER+G WEPRLSTQWFV 

Sbjct: 303 MKELAGDFAGMDRFFARQATOAKLEELGALVNIEKRVHSVGHSERSGAVVEPRLSTQWFV 362 

Query: 364 KMDQLAKNAIANQDTEDKVEFYPPRFITOTFMSWMFJ^DWISRQLWGHQIPAWYNVNG 423 
KMD+LAK A+ NQ+T+D+V+FYPPRFNDTF+ WMENVHDWVISRQLWWGHQIPAWYN G 
60 Sbjct: 363 raTOELAKQA^NQETDDRVDFYPPRFNDTFLQWMENVHDWISRQLWWGHQIPAWYNAEG 422 

Query: 424 EMYVGEDAPEGDGWTQDEDVLDTWFSSALWPFSTMGWPDTEAADFKRYFPTSTLVTGYDI 483 

E+YVGE+APEGD WTQDEDVLDTWFSSALWPFSTMGWPDT+ DFKRYFPTSTLVTGYDI 
Sbjct: 423 EIYVGEEAPEGDDWrQDEDVLDTWFSSALWPFSTMGWPDTDVEDFKRYFPTSTLVTGYDI 482 

65 
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Query: 484 IFFWVSRMIFQSLEFTGRQPFSNVLIHGLIRDEEGRKMSKSLGNGIDPMDVIEKYGADAL 543 

IFFWVSRMIFQSLEFTGRQPF NVLIHGLIRDEEGRKMSKSLGNGIDPMDVIEKYGAD+L 
Sbjct: 483 I FFWVSRMI FQSLEFTGRQPFQNVniHGLIRDEEGRKMSKSLGNGIDPMDVIEKYGADSL 542 

5 Query: 544 RWFLSNGSAPGQDTOFSYEKl^ASWNFINKIWNISRYILMNNEGLTLDQARENVEKVVNS 603 

RWFLSNGSAPGQDVRFSYEKMDASWNFINKIWNISRYILMNNEGLTL+ A NV KV S 
Sbjct: 543 RWFLSNGSAPGQDTOFSYEKMDASWNFINKIWNISRYILMNNEGLTLEDAESNVAKVAAS 602 

Query: 604 QVGISm'DRWILHOTjNETVGKVTENFDK^FGVAGHILYNFIWEEFANWYVELTKE 663 
10 + GNVTD+WILHNLNET+ KVTENFDKFEFGVAGHILYNFIWEEFANWYVELTKEVLYSD 

Sbjct: 603 EAGNVTDQWILHNIiNETIAKVTENFDKFEFGVAGHILYNFIWEEFANWYVELTKEVLYSD 662 

Query: 664 NEDEKVITRSVLLYTLDQILRLLHPIMPFVTEEIFGQYAEGSIVLASYPQVNATFENQTA 723 
NE EKVI TRSVLLYTLD+ I LRLLHP IMPFVTEEI + QYA+GSIV YP V FEN+ A 
15 Sbjct: 663 NEAEK^ITRSVLLYTLDKILRLLHPIMPFVTEEIYAQYAQGSIVTVDYPVVRPAFENEAA 722 

Query: 724 HKGVESLKDLIRSVRNSRAEVNVAPSKPITILVKTSDSELESFFKDNSNYIKRFTNPETL 783 

HKGVESLKDLIR+VRN+RAEVNVAPSKPITILVKT+DSELE FF N NYIK FTNPE L 
Sbjct: 723 HKGVESLKDLIRAVRNARAEVNVAPSKPITILVKTADSELEDFFNSMINYIKCFTNPEKL 782 

20 

Query: 784 EISSAIATPEIAMSSVITGAEIFLPLADLIJWEEELARLEKEIiAKWQKELDMVGKKLSNE 843 

EISSAIA PELAM+S+ITGAEI+LPLADLIiNVEEELARL+KEIAKWQKELDMVGKKL NE 
Sbjct: 783 EI SSAI AAPELAMTS 1 1 TGAE I YLPLADLLNVEEELARLDKELAKWQKELDMVGKKLGNE 842 

25 Query: 844 RFVANAKPEWQKEKDKQTDYQTKYDATIARIEEMKKL 881 

RFVANAKPEWQKEKDKQ DYQ KYDAT RI EMKK+ 
Sbjct: 843 RFVANAKPEWQKEKDKQADYQAKYDATQERIAEMKKI 880 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
30 vaccines or diagnostics. 

Example 228 

A DNA sequence (GBSx0242) was identified in S.agalactiae <SEQ ID 725> which encodes the amino acid 
sequence <SEQ ID 726>. Analysis of this protein sequence reveals the following: 

Possible site: 30 
35 >» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0669 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

40 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 727> which encodes the amino acid 
sequence <SEQ ID 728>. Analysis of this protein sequence reveals the following: 

45 Possible site: 57 

>>> Seems to have a cleavable N-term signal seq. 

Final Results 

50 bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
55 An alignment of the GAS and GBS proteins is shown below: 

Identities = 148/191 (77%) , Positives = 165/191 (85%) 
Query: 14 GEKKKMNIIIIGAQASGKMTIGQEIAKQTGMTLFHNHDSIDFVLRFMPWSPDSIALTESI 73 
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G + KMN+IIIGAQASGKMTIGQE+A+QTGMTLFHNHDSIDFVLRFMPWS +S AL E I 
Sbjct: 3 GAETK^IIIGAQASGKMTIGQEVARQTGMTLFHNHDSIDFVLRFMPWSQESTALIERI 62 

Query: 74 RFKFFETFAKTGQEMIFTIVIDFNDSRDWFLEKIQIVFQSHNQEVLFVELETELSERLK 133 
5 RF FFETFAKTGQ+MIFTIVIDFND DV LEKIQ VFQS++QEVLFVEL+T++ ERLK 

Sbjct: 63 RFAFFETFAKTGQDMIFTIVIDFNDPNDVAMLEKIQAVFQSYDQEVLFVELKTDIEERLK 122 

Query: 134 RNRTENRLKHKPSKRDIKWSESDICSTMDYAIFNPEVAPEALTYYHKINNTCLTATETAY 193 
RNRTENRLKHKP KR+I+WSE DI STM YA+FNPE P+ LT+Y KINNT LTA ETA 
10 Sbjct: 123 RNRTENRLKHKPLKRNIEWSEQDIQSTMAYAVFNPEEPPKTLTHYQKINNTQLTAAETAQ 182 

Query: 194 LIIQKINQIKE 204 

LIIQK+ IKE 
Sbjct: 183 LIIQKMTHIKE 193 

15 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 229 

A DNA sequence (GBSx0243) was identified in S.agalactiae <SEQ ID 729> which encodes the amino acid 
20 sequence <SEQ ID 730>. Analysis of this protein sequence reveals the following: 

Possible site: 49 

»> Seems to have no N-terminal signal sequence 

Final Results 

25 bacterial cytoplasm Certainty=0 . 3614 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

30 >GP:BAB04556 GB:AP001510 unknown conserved protein [Bacillus halodurans] 

Identities = 60/189 (31%) , Positives = 102/189 (53%) , Gaps = 3/189 (1%) 

Query: 7 EIVDNQLPVVETNRLLLRQRKLEDAKEIFEFVKLDEVSYPAGFPAVKSLEEEITYIQEIY 66 
E + LP +ET RL LR+ +DA I+++ ++V+ + +S+++ ++ + 
35 Sbjct: 4 EDIYGDLPTLETERLRLRKFYKDDAAAIYDYASNEQVTKYVLWETHQSIKDSEAFLA--F 61 

Query: 67 PTNLEKEKLPSGYAITLKGDDKVIGS VDFNH - RHEDDI FE IG YLLHPDYWGQG I VPEAAS 125 

N EK S +AI LK ++++IG+VDF + +D E+GY+L YWGQGI+ EA + 
Sbjct: 62 ALNKYDEKDVSPWAIELKFJSTERMIGTVDFVWWKPKDKTAEL^ 121 

40 

Query: 126 ALVEIGFTLLGLHKIELGCYDYNKQSQAVARKLGFTLEANIRDRRDAQGKRCGDMRFGLL 185 

ALVE GF + L +1+ C+NSVKG E R +G + ++ 

Sbjct: 122 ALvEFGFNNMELERIQAKCFAENISSARVMEKAGLIYEGTHRRAIYVKGAHRDFKVYAII 181 

45 Query: 186 RSEWEKKRR 194 

R ++E+K + 
Sbjct: 182 REDYEQKHQ 190 

A related DNA sequence was identified in S. pyogenes <SEQ ID 73 1> which encodes the amino acid 
50 sequence <SEQ ID 732>. Analysis of this protein sequence reveals the following: 

Possible site: 30 

»> Seems to have no N-terminal signal sequence 

Final Results 

55 bacterial cytoplasm Certainty=0 . 1864 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below: 
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Identities = 50/58 (86%) , Positives = 56/58 (96%) 

Query: 137 LHKIELGCYDYNKQSQAVARKLGFTLEftNIRDRRDAQGKRCGDMRFGLLRSEWEKKRR 194 
LHKIELGCYDYNKQSQAVARKLGFTLEAN RDR+D QG+RCGDMRFGL.LRSEWE++++ 
5 Sbjct: 1 LHKIELGCYDYNKQSQAVARKLGFTLEANARDRKDVQGRRCGDMRFGLLRSEWEEQKQ 58 

Based on this analysis, it was predicted that these proteins and their epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 230 

10 A DNA sequence (GBSx0244) was identified in S.agalactiae <SEQ ID 733> which encodes the amino acid 
sequence <SEQ ID 734>. This protein is predicted to be ribosomal-protein-alanine N-acetyltransferase. 
Analysis of this protein sequence reveals the following: 



15 



20 



50 



Possible site: 54 

>» Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 4066 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9599> which encodes amino acid sequence <SEQ ID 9600> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB04418 GB:AP001509 ribosomal-protein-alanine 
25 N-acetyltransferase [Bacillus halodurans] 

Identities = 63/185 (34%) , Positives = 95/185 (51%) , Gaps = 11/185 (5%) 

Query: 53 KALPKLETDRLILRQRTVGDVPAMFDYVCLEEVAYPAGLSPIASLEDEYDYFENRYYQNL 112 
K P LET RLILR+ T D ++ Y+ +EV GL P +LED E +Y+++ 

30 Sbjct: 6 KRFPILETKRLILRKITTDDARSILSYLSDKEVMKYFGLEPFQTLEDALG- -EIAWYESI 63 

Query: 113 EKAKLPSGYGITVKGSDRIIGSCAFN HRHEDDVFE I CYLLHPDYWGHGYMTEAVA 167 

+ +GIT+KG D +IGSC F+ H + FE+ L YWG G +EA+ 
Sbjct: 64 LHEQTGIRWGITLKGQDEVIGSCGFHQWVPKHHRAEIGFELSKL YWGQGIASEAIR 119 

35 

Query: 168 ALlEVGFTLLNLHKIEIRCYDYNKQSRRVAEKIjGFTLEATIRDRKDNQDNRCVNLIYGLL 227 

A+1+ GF L L +1+ N S+R+ EK GF E +R + +Y LL 

Sbjct: 120 AVIQYGFEHLELQRIQALIEPPNIPSQRLVEKQGFISEGLLRSYEYTCGKFDDLYMYSLL 179 

40 Query: 228 RSEWE 232 

+ +++ 

Sbjct: 180 KRDFD 184 

There is also homology to SEQ ID 732: 

45 Identities = 39/54 (72%) , Positives = 44/54 (81%) 

Query: 179 r.HKIEIRCYDYNKQSRRVAEKLGFTLEATIRDRKDNQDNRCVNLIYGr.LRSEWE 232 

LHKIE+ CYDYNKQS+ VA KLGFTLEA RDRKD Q RC ++ +GLLRSEWE 
Sbjct: 1 LHKIELGCYDYNKQSQAVARKLGFTLEANARDRKDVQGRRCGDMRFGLLRSEWE 54 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 231 

A DNA sequence (GBSx0245) was identified in S.agalactiae <SEQ ID 735> which encodes the amino acid 
sequence <SEQ ID 736>. Analysis of this protein sequence reveals the following: 

Possible site: 51 
5 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2719 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial outside Certainty=0 . 0000 (Mot Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
15 vaccines or diagnostics. 

Example 232 

A DNA sequence (GBSx0246) was identified in S.agalactiae <SEQ ID 737> which encodes the amino acid 

sequence <SEQ ID 738>. Analysis of this protein sequence reveals the following: 

Possible site: 53 
20 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3250 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

25 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9597> which encodes amino acid sequence <SEQ ID 9598> 
was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 

30 A related DNA sequence was identified in S.pyogenes <SEQ ID 73 9> which encodes the amino acid 
sequence <SEQ ID 740>. Analysis of this protein sequence reveals the following: 

Possible site: 38 

>>> Seems to have no N-terminal signal sequence 

35 Final Results 

bacterial cytoplasm Certainty=0 . 3293 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

40 An alignment of the GAS and GBS proteins is shown below: 

Identities = 24/55 (43%) , Positives = 38/55 (68%) 

Query: 56 LLEGLTANKQDVljKEAGLVSLEAFAKVSFJUJVLALKGIGPAAIKQLVDNGVVFAK 110 
++ G+ ++ + L G+ S +AF + +E D+LALKGIGPA +K+LV+NG F K 
45 Sbjct: 77 WAGIRSDLvETLYAEGIHSAQAFKEWTEKDIiLALKGIGPAWKKLVENGASFKK 131 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 233 

A DNA sequence (GBSx0247) was identified in S.agalactiae <SEQ ID 741> which encodes the amino acid 
sequence <SEQ ID 742>. Analysis of this protein sequence reveals the following: 

Possible site: 25 
5 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2901 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 743> which encodes the amino acid 
sequence <SEQ ID 744>. Analysis of this protein sequence reveals the following: 

15 Possible site: 27 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2535 (Affirmative) < suco 

20 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 57/84 (67%) , Positives = 73/84 (86%) 

25 

Query: 1 MSYEQEFLKTJFEEWLQSQISINQMAMSAKKVLEEDKDERAADAYIRYESKLDAYRFLQG 60 

MSYE+EFLKDFE+W+++QI +NQ+AM ++++V +ED DERA DA+IRYESKLDAY FL G 
Sbjct: 1 MSYEKEFLKDFEDWVKTQIQVNQLAMATSQEVAQEDGDERAKDAFIRYESKLDAYEFLLG 60 

30 Query: 61 KFNNYHNQKS FHDLPDGLFGQRHY 84 

KF+NY N K+FHD+PD LFG RHY 
Sbjct: 61 KFDNYKNGKAFHDI PDELFGARHY 84 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
35 vaccines or diagnostics. 

Example 234 

A DNA sequence (GBSx0248) was identified in S.agalactiae <SEQ ID 745> which encodes the amino acid 
sequence <SEQ ID 746>. This protein is predicted to be methyltransferase. Analysis of this protein 
sequence reveals the following: 

40 Possible site: 61 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2469 (Affirmative) < suco 

45 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related DNA sequence was identified in S.pyogenes <SEQ ID 747> which encodes the amino acid 
sequence <SEQ ID 748>. Analysis of this protein sequence reveals the following: 

50 Possible site: 35 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3352 (Affirmative) < suco 
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bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 26/60 (43%) , Positives = 37/60 (61%) 

Query: 23 LKNERCPHPKLINVLERKLEIILGDQKHILEKDSLISLSPQETHHLRAIENSKFLQIELD 82 

+ E P K+I VLE +L L DQK +L ++SLI++ Q+ HHL A + K LQ+ LD 
Sbjct: 42 ISQETSPRDKVILVLEGQLIFDLEDQKQVLTQESLIAIPAQKVHHLEAKTDCKLLQVLLD 101 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 235 

A DNA sequence (GBSx0249) was identified in S.agalactiae <SEQ ID 749> which encodes the amino acid 
15 sequence <SEQ ID 750>. This protein is predicted to be integrase (codV). Analysis of this protein sequence 
reveals the following: 

, Possible site: 59 
>» Seems to have no N-terminal signal sequence 

20 Final Results 

bacterial cytoplasm Certainty=0 . 3842 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

25 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 236 

A DNA sequence (GBSx0250) was identified in S.agalactiae <SEQ ID 751> which encodes the amino acid 
30 sequence <SEQ ID 752>. Analysis of this protein sequence reveals the following: 

Possible site: 22 

>>> May be a lipoprotein 

Final Results 

35 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

40 No corresponding DNA sequence was identified in S.pyogenes. . 

SEQ ID 752 (GBS 128) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 23 (lane 5; MW 15kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 32 (lane 4; 2 bands). 

The GBS128-GST fusion product was purified (Figure 198, lane 2) and used to immunise mice. The 
45 resulting antiserum was used for FACS (Figure 288), which confirmed that the protein is immunoaccessible 
on GBS bacteria. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 237 

A DNA sequence (GBSx0251) was identified in S.agalactiae <SEQ ID 753> which encodes the amino acid 
sequence <SEQ ID 754>. Analysis of this protein sequence reveals the following: 

Possible site: 30 

>>> Seems to have no N-terminal signal sequence 

Pinal Results 

bacterial cytoplasm Certainty=0 .2940 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 755> which encodes the amino acid 
sequence <SEQ ID 756>. Analysis of this protein sequence reveals the following: 

Possible site: 42 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2518 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 30/90 (33%) , Positives = 49/90 (54%) , Gaps = 10/90 (11%) 

Query: 3 TVAVRVDDQLKDDATELFQSLGLDMSTAVKMFL I QSVKTQS I PFE I K NKSSV 54 

T+ +RVDD +K A ++ + LG+ MSTA+ MFL Q + T IPF++ N + 

Sbjct: 15 TUJLRVDDSVKSAADDILKRLGIPMSTAIDMFLNQIILTGGIPFDVSLPFAPQRVNVDYM 74 

Query: 55 SDEEFQNLVETKLKGIRVKASDPESVNAFF 84 

S E+F + + T + K +P+ V F+ 
Sbjct: 75 SQEKFYDKLITSFED- -AKTCNPQDVGKFY 102 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 238 

A DNA sequence (GBSx0252) was identified in S.agalactiae <SEQ ID 757> which encodes the amino acid 
sequence <SEQ ID 758>. This protein is predicted to be surface protein Rib. Analysis of this protein 
sequence reveals the following: 

Possible site: 24 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.81 Transmembrane 370 - 386 ( 368 - 388) 

Final Results 

bacterial membrane Certainty=0. 2126 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 
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A related GBS nucleic acid sequence <SEQ ID 9593> which encodes amino acid sequence <SEQ ID 9594> 
was also identified. A related GBS nucleic acid sequence <SEQ ID 10773> which encodes amino acid 
sequence <SEQ ID 10774> was also identified. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 759> which encodes the amino acid 
5 sequence <SEQ ID 760>. Analysis of this protein sequence reveals the following: 

Possible site: 37 
»> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -4.57 Transmembrane 354 - 370 ( 353 - 371) 

10 Final Results 

bacterial membrane Certainty=0. 2826 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

15 LPXTG motif: 344-348 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 64/277 (23%) , Positives = 99/277 (35%) , Gaps = 31/277 (11%) 

20 Query: 126 SIGNLPDLPKGTTOAFETPVDTATPGDKPATOWTYPDGSKDTVDVTvTWVDPRTDADKN 185 

++ +LP +TTEPV +V +D+ + T PA 

Sbjct: 121 AVKDLPASTESTTQPVEAPVQETQASASDSMVTGDSTSVTTDSPEETPSSESPVAPALSE 180 

Query: 186 DPAGKDQQVNVGETPKAEDSIGNLPDLPKGTTVAFETPVDTATPGDKPAKVWTYPDGSK 245 
25 PAQEPSPTA ETP + A P P + S+ 

Sbjct: 181 APA QPAESEEPSVAASSEETPS--PSTPAAPETPEEPAAPSPSPESEEPSVAAPSE 234 

Query: 246 DTVDVTVKVVDPRTDADKNDPAGKDQQVNVGETPKAEDSIGNLPDLPKGTTVAFETPVDT 305 
+T P A + PA ++ T + P P + +TP 

30 Sbjct: 235 ETPSPET PEEPAAPSQPAESEESSVAATTSPS PSTPAESET- -QTPPAV 281 

Query: 306 ATPGDKPAKWVTYPDGSKDTTOVTVKVVDPRTDADK NDPAGKDQQVNGK 355 

DKP+ PS + TV+ + +DK N + + + 

Sbjct: 282 TKDSDKPSSAAEK-PAASSLVSEQTVQQPTSKRSSDKKEEQEQSYSPNRSLSRQVRAHES 340 

35 

Query: 356 GNKLPATGENATPFFNWALTIMSSVGLLSVSKKKED 392 

G LP+TGE A P F + +T+MS G L V+K++++ 
Sbjct: 341 GKYLPSTGEKAQPLF - 1 ATMTLMSLFGSLL VTKRQKE 376 

40 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 239 

A DNA sequence (GBSx0253) was identified in S.agalactiae <SEQ ID 761> which encodes the amino acid 
sequence <SEQ ID 762>. This protein is predicted to be surface protein Rib. Analysis of this protein 
45 sequence reveals the following: 

Possible site: 49 

>>> Seems to have no N-terminal signal sequence 

Final Results 

50 bacterial cytoplasm Certainty=0 . 5289 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 



No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 240 

A DNA sequence (GBSx0254) was identified in S.agalactiae <SEQ ID 763> which encodes the amino acid 
sequence <SEQ ID 764>. This protein is predicted to be surface protein Rib. Analysis of this protein 
sequence reveals the following: 

Possible site: 53 

>>> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -1.06 Transmembrane 39 - 55 ( 39 - 55) 



Final Results 

bacterial membrane Certainty=0 . 1426 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9591> which encodes amino acid sequence <SEQ ID 9592> 
was also identified. 

The protein differs significantly from U58333 in several places: 

Query: 157 TKPDGQVDI VNVSLTIYNSSALRDKIDEVKK KAED PKWDEGSRDK 201 

20 T PDG D V+V++ + + DK D K KAED P +G+ 

Sbjct: 683 TYPDGSKDTVDVTVKAA7DPRTDADKNDPAGKDQQVNVGETPKAEDSIGNLPDLPKGTTVA 742 

Query: 202 VLISLDDIKTDIDNNPK TQSDIATOITEvTNLEKILVPRIPDADKNDPAGKDQQVNV 258 

+D T D K T D + +VT K++ PR DADKNDPAGKDQQVNV 
25 Sbjct: 743 FETPVDTA- TPGDKPAKVVVTY PDGSKDTVDVT- -VKWDPRT-DADKNDPAGKDQQ VNV 798 

Query: 157 TKPDGQVDIVNVSLTIYNSSALRDKIDEVKK KAED PKWDEGSRDK 201 

T PDG D V+V++ + + DK D K KAED P +G+ 

Sbjct: 841 TYPDGSKDTVDVTVKVVDPRTDADKNDPAGKDQXJVNVGETPKAEDSIG^PDLPKGTTVA 900 

30 

Query: 202 VLISLDDIKTDIDNNPK TQSDIANKITEVTNLEKILVPRIPDADKNDPAGKDQQVNV 258 

+D T D K T D + +VT K++ PR DADKNDPAGKDQQVNV 
Sbjct: 901 FETPVDTA- TPGDKPAKVVVTYPDGSKDTVDVT--VKVVDPRT-DADKNDPAGKDQQVNV 956 

35 Query: 157 TKPDGQVDIVNVSLTIYNSSALRDKIDEVKK KAED PKWDEGSRDK 201 

T PDG D V+V++ + + DK D K KAED P +G+ 

Sbjct: 288 TYPDGSKDTVDVTVKVVDPRTDADKNDPAGKDQQVNVGETPKAEDSIGNLPDLPKGTTVA 347 

Query: 202 VLISLDDIKTDIDNNPK- - -TQSDIANKITEVTNLEKILVPRIPDADKNDPAGKDQQVNV 258 
40 +D T D K T D + +VT K++ PR DADKNDPAGKDQQVNV 

Sbjct: 348 FETPVDTA-TPGDKPAKVWTYPDGSKDTVDVT- -VKWDPRT-DADKNDPAGKDQQVNV 403 

Query: 157 TKPDGQVDIVNVSLTIYNSSALRDKIDEVKK KAED PKWDEGSRDK 201 

T PDG D V+V++ + + DK D K KAED P +G+ 

45 Sbjct: 604 TYPDGSKDTVDVTVKVVDPRTDADKNDPAGKDQQVNVGETPICAEDSIGNLPDLPKGTTVA 663 

Query: 202 VLISLDDIKTDIDNNPK- - -TQSDIANKITEVTNLEKILVPRIPDADKNDPAGKDQQVNV 258 

+D T D K T D + +VT K++ PR DADKNDPAGKDQQVNV 
Sbjct: 664 FETPVDTA-TPGDKPAKVVVTYPDGSKDTVDVT--VKVVDPRT-DADKNDPAGKDQQVNV 719 

50 

Query: 157 TKPDGQVDIVNVSLTIYNSSALRDKIDEVKK KAED PKWDEGSRDK 201 

T PDG D V+V++ + + DK D K KAED P +G+ 

Sbjct: 446 TYPDGSKDTVDVTVKVVDPRTDADKNDPAGKDQQVNVGETPKAEDSIGNLPDLPKGTTVA 505 

55 Query: 202 VLISLDDIKTDIDNNPK TQSDIANKITEVTNLEKILVPRIPDADKNDPAGKDQQVNV 258 

+D T D K T D + +VT K++ PR DADKNDPAGKDQQVNV 
Sbjct: 506 FETPVDTA-TPGDKPAKVVVTYPDGSKDTVDVT--VKVVDPRT-DADKNDPAGKDQQVNV 561 

Query: 157 TKPDGQVDIVNVSLTIYNSSALRDKIDEVKK KAED PKWDEGSRDK 201 

60 " T PDG D V+V++ + + DK D K KAED P +G+ 
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Sbjct: 920 TYPDGSKDTVDVTVKAA7DPRTDADKNDPAGKDQQVNVGETPKAEDSIGNLPDLPKGTTVA 979 

Query: 202 VLISLDDIKTDIDNNPK TQSDIANKITEVTNLEKILVPRIPDADKNDPAGKDQQVNV 258 

+D T D K T D + +VT K++ PR DADKNDPAGKDQQVNV 
5 Sbjct: 980 FETPVDTA-TPGDKPAKOT\m:PDGSKDTVDVT--VKVVDPRT-DADKiroPAGKDQQVlW 1035 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

10 Example 241 

A DNA sequence (GBSx0255) was identified in S.agalactiae <SEQ ID 765> which encodes the amino acid 
sequence <SEQ ID 766>. This protein is predicted to be ara-C-like activator. Analysis of this protein 
sequence reveals the following: 

Possible site: 30 
15 >>> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -0.37 Transmembrane 8 - 24 ( 8 - 25) 

Final Results 

bacterial membrane Certainty=0 . 1150 (Affirmative) < suco 

20 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9589> which encodes amino acid sequence <SEQ ID 9590> 
was also identified. 

25 There is homology to SEQ ID 460. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 242 

A DNA sequence (GBSx0256) was identified in S.agalactiae <SEQ ID 767> which encodes the amino acid 
30 sequence <SEQ ID 768>. Analysis of this protein sequence reveals the following: 

Possible site: 53 

>» Seems to have no N-terminal signal sequence 

Final Results 

35 bacterial cytoplasm Certainty=0 . 1200 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9587> which encodes amino acid sequence <SEQ ID 9588> 
40 was also identified. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 769> which encodes the amino acid 
sequence <SEQ ID 770>. Analysis of this protein sequence reveals the following: 



45 



Possible site: 50 

>>> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 0679 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 .0000 (Not Clear) < suco 
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An alignment of the GAS and GBS proteins is shown below: 

Identities = 135/176 (76%) , Positives = 161/176 (90%) 

5 Query: 1 MSYMVKDRQIQKTK^AIYNAFISLLQEMDYSKlWQDVIGLAWGRSTFYSHyESKEVLL 60 

+S M KDRQI+KTK AIY+AFI+LLQ+ +YSKITV+D+I LANVGRSTFY+HYESKE+LL 
Sbjct: 1 VSDMTKDRQIKKTKTAIYSAFIALLQKKEYSKITVRDMITLANVGRSTFYAHYESKEMLL 60 

Query: 61 KELCEDLFHHLFKQGRDVTFEEYLVHILKHFEQNQDSIATLLLSDDPYFLLRFRSELEHD 120 
10 KELCE+LFHHLF+Q R+VTFE+YLVHILKHFEQN+DSIATLLLS+DPYFLLRF++ELEHD 

Sbjct: 61 KELCEELFHHLFRQKRNVTFEDYLVHILKHFEQNKDSIATLLLSNDPYFLLRFKNELEHD 120 

Query: 121 VYPRLREEYI TKVDI PEDFLKQFLLS S FIETLKWWLHQRQKMTVEDLLKYYLTMVE 176 
VYP LR +YI K IPE FLKQF+LSSFIETLKWWLHQRQ+M+ +LLKYYL +++ 
15 Sbjct: 121 VYPNLRCKYIDKTTIPEVFLKQFvLSSFIETLKWWLHQRQRMSANELLKYYLELIK 176 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 243 

20 A DNA sequence (GBSx0257) was identified in S.agalactiae <SEQ ID 771 > which encodes the amino acid 
sequence <SEQ ID 772>. Analysis of this protein sequence reveals the following: 

Possible site: 29 

>» Seems to have no N-terminal signal sequence 

25 Final Results 

bacterial cytoplasm Certainty=0. 3573 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

30 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 244 

35 A DNA sequence (GBSx0258) was identified in S.agalactiae <SEQ ID 773> which encodes the amino acid 
sequence <SEQ ID 774>. Analysis of this protein sequence reveals the following: 

Possible site: 35 

>» Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood =-10.19 Transmembrane 112 - 128 ( 107 - 131) 

40 INTEGRAL Likelihood = -8.07 Transmembrane 77 - 93 ( 71 - 97) 

INTEGRAL Likelihood = -6.10 Transmembrane 144 - 160 ( 138 - 165) 

INTEGRAL Likelihood = -3.03 Transmembrane 165 - 181 ( 164 - 182) 

Final Results 

45 bacterial membrane Certainty=0. 5 07 6 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

A related DNA sequence was identified in S.pyogenes <SEQ ID 775> which encodes the amino acid 
50 sequence <SEQ ID 776>. Analysis of this protein sequence reveals the following: 

Possible site: 35 
>» Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -9.13 Transmembrane 112 - 128 ( 107 - 130) 
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INTEGRAL Likelihood = -5.89 Transmembrane 144 - 160 ( 138 - 163) 

INTEGRAL Likelihood = -5.47 Transmembrane 7 - 23 ( 6 - 29) 

INTEGRAL Likelihood = -3.50 Transmembrane 77 - 93 ( 74 - 94) ' 

INTEGRAL Likelihood = -2.07 Transmembrane 166 - 182 ( 165 - 183) 



Final Results 

bacterial membrane 
bacterial outside 
bacterial cytoplasm 



-- Certainty=0. 4652 (Affirmative) < suco 
-- Certainty=0. 0000 (Not Clear) < suco 
-- Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 212/287 (73%) , Positives = 245/287 (84%) 



Query: 


1 


MTSNKKVAIAFILNISFSVLEFIFGSLFFSGAILADAVHDFGDAIAIGISATLEKKSKKD 


60 






M ++KKV I FILN+SFS++EFIFG+LFFSGAILADAVHDFGDAIAIGISA LE+K+ K 




Sb j ct : 


1 


MPASKKVTIIFIIjNLSFSLIEFIFGTLFFSGAIIiADAVHDFGDAIAIGISAILERKAVKK 


60 


Query: 


61 


EDTIFSLGYKRFSLLGALITSLILISGSILVMIENIPKLWHPTPVNYHGMFILAVIAIII 


120 






E FSLGYKRFSLLGAL T+LILISGS+LVMIE IPKLWHPT VNY GMF+LA+ AIII 




Sb j ct : 


61 


ESPNFSLGYKRFSLLGALTTNLILISGSLLvMIETIPKLWHPTIvNYDGMFVLAIFAIII 


120 


Query: 


121 


NGLASFILHSGQSKHEEILSLHFLEDILGWLAIIVISLILNWKPLYILDPLLSVAISTFI 


180 






NG ASFI+HS Q+K+EEILSLHFLEDILGWLAII++SLIL WKP YILDPLLS+AI++FI 




Sb j ct : 


121 


NGFASFIIHSNQTKNEEILSLHFLEDILGWLAIIILSLILKWKPWYILDPLLSIAIASFI 


180 


Query: 


181 


LSKALPKLLSTLKLFLDGVPDSIDYAALHDELKGLSQVRSINQLNIWSMDGIDNRAIIHC 


240 






LSKALPKL++T +FLDGVPDS IDY LH EL L + S+NQLN+WSMDGID+RA IHC 




Sb j ct : 


181 


LSKALPKLVATANIFLDGVPDSIDYCTLHHELSQLPHIVSVNQLNVWSMDGIDHRATIHC 


240 


Query: 


241 


CLNQLISEKDCKRAIRTICQHYKINDVTVEIDYSLREHQNHCKPLKN 287 








CL + +EK CK++IR ICQ Y IN VTVEID SL EHQ+HC L + 




Sb j ct : 


241 


CLRESTTEKHCKKSIRLICQRYNINSVTVEIDTSLNEHQHHCSSLSS 287 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 245 

A DNA sequence (GBSx0259) was identified in S.agalactiae <SEQ ID 777> which encodes the amino acid 
sequence <SEQ ID 778>. Analysis of this protein sequence reveals the following: 

Possible site: 48 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.22 Transmembrane 221 - 237 ( 221 - 237) 

Final Results 

bacterial membrane Certainty=0 . 1489 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

There is also homology to SEQ ID 780. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 



Example 246 

A DNA sequence (GBSx0260) was identified in S.agalactiae <SEQ ID 781> which encodes the amino acid 
sequence <SEQ ID 782>. Analysis of this protein sequence reveals the following: 

Possible site: 13 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.50 Transmembrane 2 - 18 ( 1 - 18) 
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Final Results 

bacterial membrane Certainty=0 . 1999 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

5 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
10 vaccines or diagnostics. 

Example 247 

A DNA sequence (GBSx0261) was identified in S.agalactiae <SEQ ID 783> which encodes the amino acid 
sequence <SEQ ID 784>. This protein is predicted to be dehydrogenase (Zn-dependent). Analysis of this 
protein sequence reveals the following: 

15 Possible site: 15 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -3.77 Transmembrane 171 - 187 ( 170 - 187) 

Final Results 

20 bacterial membrane Certainty=0 . 2508 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

25 >GP:AAG20655 GB:AE005134 alcohol dehydrogenase; Adh2 [Halobacterium 

sp. NRC-1] 

Identities = 169/348 (48%), Positives = 232/348 (66%), Gaps = 9/348 (2%) 

Query: 1 MKVATFIEPGKMVITDTPKPVIEQETDAVIKIVRACVCGSDLWWYRGISKRESGSFAGHE 60 
30 M+ A + PG++ + + PKP IE DAVI++ VCGSDLW+YRG S RE+GS GHE 

Sbjct: 1 MRAAVYQGPGEIAVEEVPKPDIESPEDAVIRVTHTAVCGSDLWFYRGDSDREAGSRVGHE 60 

Query: 61 AIGIVEEVGTKVTDVSKGDFVIVPFTHGCGQCPSCKAGFDGNCTNHQA- - -AKNVGYQGQ 117 
+GIVEEVG VT V+ GD VI PF CG+C C+ G +C ++ N G QG+ 

35 Sbjct: 61 PMGIVEEVGDDVTSVAPGDRVIAPFAISCGECEFCRQGLYTSCVEDESWGSEANGGGQGE 120 

Query: 118 YLRYTNANWALVKIPGQPSDYDNETLNSLLTLSDVMATGYHAAATAEVKEGDTVVVMGDG 177 

Y++ A+ LV++P + +D D + L SLL L+DVM TG+HAA +A V EGDT W+GDG 
Sbjct: 121 YVKCPFADGTLVRVPDRYAD-DEDVLESLLPLTDVMGTGHHAAVSAGVGEGDTAVWGDG 179 

40 

Query: 178 AVGLCGVIAAKMLGANRIIAMSRHKDRQELALTFGATDIVEERGDEAVKRVLDLTNQAGA 237 

AVGLCGV+AA+ LGA RIIAM H+DR ELA FGATD + RGD+A++R DLT+ GA 
Sbjct: 180 AVGLCGVLAAQRLGAERIIAMGHHEDRLELAAEFGATDTISARGDDAIERARDLTH-GGA 238 

45 Query: 238 DAVLECVGTEQSVDTATQIARPGAVIGRVGIP QNPDMNTNNLFWKNIGLRGGIASVT 294 

+ V+ECVG ++D+A IARPG +G VG+P ++ ++ +F NI +RGG+A V 
Sbjct: 239 NHvMECVGAASAMDSAIAIARPGGTVGYVGVPYGVEDGGLDVFTMFSDNITIRGGVAPVR 298 

Query: 295 TFDKSVLLDAVLTHKINPGLVFTKS FVLDDIQKAYEAMDKRDAI KSLV 342 
50 + + ++ D VL ++P +FTK+ LD + + Y AMD R+AIK LV 

Sbjct: 299 AYAEELMAD-VLQGTLDPSPIFTKTVDLDGVPEGYAAMDDREAIKVLV 345 

There is also homology to SEQ ID 786. 

A related sequence was also identified in GAS <SEQ ID 9145> which encodes the amino acid sequence 
55 <SEQ ID 9146>. Analysis of this protein sequence reveals the following: 

Possible site: 23 
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>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -5.41 Transmembrane 170 - 186 



5 Final Results 

bacterial membrane Certainty=0. 3166 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

10 An alignment of the GAS and GBS proteins is shown below: 

Identities = 121/353 (34%) , Positives = 182/353 (51%) , Gaps = 16/353 (4%) 

Query: 1 MKVATFIEPGKMVITDTPKPVIKQETDAVIKIVRACVCGSDLWWYRG-ISKRESGSFAGH 59 
MK AT++ G + + D PKPVI + TDA++++V+ +CG+DL G + + G+ GH 
15 Sbjct: 15 MKAATYLSTGNLQLIDKPKPVIIKPTDAIVQLVKTTICGTDLHILGGDVPACKEGTILGH 74 

Query: 60 EAIGIVEEVGTKVTDVSKGDFVIVPFTHGCGQCPSCKAGFDGNCTNHQAAKN- - -VGYQG 116 

E IGIV+EVG VT+ GD VI+ C C CK G +C + G Q 

Sbjct: 75 EGIGIVKEVGDAVTNFKIGDKVIISCVTSCHTCYYCKRGLSSHCQDGGWILGHLINGTQA 134 

20 

Query: 117 QYLRYTNANWALVKIPGQPSDYDNETLNSLLTLSDVMATGYH-AAATAEVKEGDTVVvMG 175 

+Y+ +A+ +L P D +L+ LSD++ T Y + VK GD V ++G 

Sbjct: 135 EYVHI PHADGSLYHAPDTIDD EALVMLSDILPTSYEIGVLPSHVKPGDNVCIVG 188 

25 Query: 176 DGAVGLCGVIAAKMLGANRIIAMSRHKDRQELALTFGATDIVEERGDEAVKRVL-DLTNQ 234 

G VGL ++ + II + ++R E A TFGAT + E VK ++ D+TN 

Sbjct: 189 AGPVGLAALLTVQFFSPANI IMVDLSQNRLEAAKTFGATHTICSGSSEE VKAI IDDITNG 248 

Query: 235 AGADAVLECVGTEQSVDTATQIARPGAVIGRVGIPQNP-DMNTNNLFWKNIGLRGGIASV 293 
30 G D +ECVG + D +1 G I VG+ P D N + L+ KNI L G+ + 

Sbjct: 249 RGVDISMECTGYPATFDICQKIISVGGHIANTOVHGKPV 308 

Query: 294 TTFDKSVLLDAVLTHKINPGLVFTKSFVI^DIQKAYEAMDKRnAIKSL-VIVD 345 
T + +LL+ + T KI+ + T F L +++KAYE A +L VI+D 

35 Sbjct: 309 NTTE--MLUTOjKTGKIDATRLITHHFKLSEVEKAYETFKHAGANNALKVIID 359 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 248 

40 A DNA sequence (GBSx0262) was identified in S.agalactiae <SEQ ID 787> which encodes the amino acid 
sequence <SEQ ID 788>. Analysis of this protein sequence reveals the following: 

Possible site: 31 

»> Seems to have no N-terminal signal sequence 

45 Final Results 

bacterial cytoplasm Certainty=0 .2169 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

50 The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAD36075 GB:AE001762 hypothetical protein [Thermotoga maritima] 
Identities = 55/128 (42%) , Positives = 72/128 (55%) , Gaps = 8/128 (6%) 

Query: 8 IFPKGEKNPYGEFFIGQSYLAAIiAKSPDG--NVSVGNVTFEAGCRNNWHVHLDGYQILLV 65 
55 IF +G K +FF G ++ L +G N V +V FE G R +WH H G QIL+V 

Sbjct: 5 I FERGSKGS - SDFFTGNVWVKMLOTDENGVFNTQVYDWFEPGARTHWHSHPGG-QILIV 62 

Query: 66 TEGSGWYQEEGKEAVSLKPGDVIVTDKGVRHWHGAKKDSEFAHIAITA GKSEFYEA 121 

T G G+YQE GK A LK GDV+ V HWHGA D E HI 1+ G +E+ + 

60 Sbjct: 63 TRGKGFYQERGKPARILKKGDVVEIPPNVVHWHGAAPDEELVHIGISTQVHLGPAEWLGS 122 
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Query: 122 VSDEEYSR 129 

V++EEY + 
Sbjct: 123 VTEEEYRK 130 

5 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 249 

A DNA sequence (GBSx0263) was identified in S.agalactiae <SEQ ID 789> which encodes the amino acid 
10 sequence <SEQ ID 790>. This protein is predicted to be gamma-carboxymuconolactone decarboxylase. 
Analysis of this protein sequence reveals the following: 

Possible site: 49 

>» Seems to have no N-terminal signal sequence 

15 Final Results 

bacterial cytoplasm Certainty=0. 4089 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

20 The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAA20070 GB:AL031155 3-oxoadipate enol-lactone 

hydrolase/4 -carboxymuconolactone decarboxylase 
[Streptomyces coelicolor A3 (2) ] 
Identities = 33/93 (35%) , Positives = 59/93 (62%) , Gaps = 1/93 (1%) 

25 

Query: 11 QLEEFAPEFARYNDDILFGEV^AKET^ILTDKTRSIITISALISGGNLEQLEHHLQFAKQN 70 

Q +EF+ +F + +GE+W + L ++RS +T++AL++GG+L++L HL+ A +N 

Sbjct: 349 QADEFSGDFQEFLTRYAWGEIWDRPG-LDRRSRSCVTLTALVAGGHLDELAPHLRAALRN 407 

30 Query: 71 GVTKEEIADI ITHLAFYVGWPKAWSAFNKAKEI 103 

G+T EI +++ A Y G P A AF A++ + 
Sbjct: 408 GLTPGEIKEVLLQAAVYCGVPAANGAFRVAQQV 440 

No corresponding DNA sequence was identified in S.pyogenes. 

35 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 250 

A DNA sequence (GBSx0265) was identified in S.agalactiae <SEQ ID 791 > which encodes the amino acid 
sequence <SEQ ID 792>. Analysis of this protein sequence reveals the following: 

40 Possible site: 44 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 5529 (Affirmative) < suco 

45 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 251 

A DNA sequence (GBSx0266) was identified in S.agalactiae <SEQ ID 793> which encodes the amino acid 
5 sequence <SEQ ID 794>. This protein is predicted to be probable transcriptional regulator. Analysis of this 
protein sequence reveals the following: 
Possible site: 58 

>>> Seems to have an uncleavable N-term signal seq 

10 Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

15 A related GBS nucleic acid sequence <SEQ ID 9585> which encodes amino acid sequence <SEQ ID 9586> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAG08263 GB:AE004901 probable transcriptional regulator 
[Pseudomonas aeruginosa] 
20 Identities = 36/148 (24%) , Positives = 68/148 (45%) , Gaps = 22/148 (14%) 

Query: 5 QIVEKPAMILAG -VTLENVKSNQEGIQQAIGICKTQPDFRFD 45 

+IVE+PA +G +E+++++ GIC QP+ F 

Sbjct: 123 RIVERPAFSWGMEYFGSAPGDTIGQLWERFIPREHEIAGKHDPEVSYGICAQQPNGEFH 182 

25 

Query: 46 YSATYQVETSVCAPKGLEIIRIPSATYAVISVKGPMPSSLQETWRKIIQGFFQENNLKPA 105 

Y A ++V+ P+G+ ++P+ YAV + KG P + E+++ I E L+P 

Sbjct: 183 YVAGFEVQEGWPVPEGMVRFQVPAQKYAVFTHKHTAP-QIAESFQAIYSHLLAERGLEPK 241 

30 Query: 106 NSPNLEIYSSQH- -PQDTDYQMEIWLAI 131 

+ E Y + P D + Q+++++ I 
Sbjct: 242 AGVDFEYYDQRFRGPLDPNSQVDLYIPI 269 

No corresponding DNA sequence was identified in S.pyogenes. 

35 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 252 

A DNA sequence (GBSx0267) was identified in S.agalactiae <SEQ ID 795> which encodes the amino acid 
sequence <SEQ ID 796>. Analysis of this protein sequence reveals the following: 

40 Possible site: 24 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0887 (Affirmative) < suco 

45 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAB84919 GB:AE000825 conserved protein [Methanothermobacter 
50 thermoautotrophicus] 

Identities = 42/130 (32%) , Positives = 71/130 (54%) , Gaps = 3/130 (2%) 



WO 02/34771 



PCT/GB01/04789 



-313- 

Query: 1 MITQEMKEIINSQLAIWATVDAKGQPNIGPKRSMRLWDDKTFIYNENTDGQTRINIEDNG 60 

M+T EM + I +L VAT D +G PN+ P R D++T + +N +T N+ +N 
Sbjct: 1 MMTPEMMDAIEKELVFVATADEEGTPNWPIGFARPLDERTILIADNYMKKTIRNLHENP 60 

5 Query: 61 KIEIAFVDRERLLGYRFVGTAEIQTEGTYYEAAKKWAEGRMG- -VPKAVGI IHVERIFNL 118 

+1 + R Y+F GT EI G Y++ +WA+ M PK+ ++ VE I+++ 

Sbjct: 61 RIAL-IPQNARECPYQFKGTVEIFKSGKYFDMVVEWAQNVMTELEPKSAILMTVEEIYSV 119 

Query: 119 QSGANAGKEI 128 
10 + G AG+++ 

Sbjct: 120 KPGPEAGEKV 129 

A related DNA sequence was identified in S.pyogenes <SEQ ID 797> which encodes the amino acid 
sequence <SEQ ID 798>. Analysis of this protein sequence reveals the following: 

15 Possible site: 24 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0789 (Affirmative) < suco 

20 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 123/128 (96%) , Positives = 127/128 (99%) 

25 

Query: 1 MITQEMKEIINSQLAMVATVDAKGQPNIGPKRSMRLWDDKTFIYNENTDGQTRINIEDNG 60 

MITQEMK++IN+QIAMVATVDAKGQPNIGPKRSMRLWDDKTFIYNENTDGQTRINIEDNG 
Sbjct: 1 MITQEMKDLINNQIAMVATVDAKGQPNIGPKRSMUiWDDKTFIYNENTDGQTRINIEDNG 60 

30 Query: 61 KIE IAFVDRERLLGYRFVGTAEIQTEGTYYEAAKKWftEGRMGVPKAVGI IHVERI FNLQS 120 

KIEIAFVDRERLLGYRFVGTAEIQTEG YYEA&KKWA+GRMGVPKAVGI I HVERI FNLQS 
Sbjct: 61 KIEIAFVDRERLLGYRFVGTAEICjTEGAYYEAAKKWACjGRMGVPKAVGIIHVERIFNLQS 120 

Query: 121 GANAGKE1 128 
35 GANAGKEI 

Sbjct: 121 GANAGKEI 128 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

40 Example 253 

A DNA sequence (GBSx0268) was identified in S.agalactiae <SEQ ID 799> which encodes the amino acid 
sequence <SEQ ID 800>. Analysis of this protein sequence reveals the following: 

Possible site: 22 

>>> Seems to have a cleavable N-term signal seq. 
45 INTEGRAL Likelihood = -5.47 Transmembrane 1028 -1044 (1027 -1048) 

Final Results 

bacterial membrane Certainty=0. 3187 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

50 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

!GB:AF054892 surface antigen BspA [Bacteroides forsy. . 

!GB:AF054892 surface antigen BspA [Bacteroides forsy.. 

55 !GB:AF054892 surface antigen BspA [Bacteroides forsy.. 

!GB:AF054892 surface antigen BspA [Bacteroides forsy.. 

!GB:AF054892 surface antigen BspA [Bacteroides forsy.. 



>GP:AAC82625 GB:AF054892 surface antigen BspA [Bacteroides 
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forsythus] 

Identities = 143/566 (25%) , Positives = 243/566 (42%) , Gaps = 52/566 (9%) 

Query: 95 VPKAKPEVTQEASNSSNDASKVEVPKQDTASKKETLETSTWEAKDFVTRGDTLVG F 150 

+P+++A + ++PTA + LT + T+G F 

Sbjct: 120 IPNSVTTIGEWAFKGCSGLKSITLPNSLTAIGQSALSGCTGLTSITIPNSOTTIGEWAFF 179 

Query: 151 SKSGINKLSQTSHLVLPSHAA- - DGTQLTQVASFAFTPDKKTAIAEYTSRLGENGKPSRL 208 

SG+ ++ + L +A LT + PD TIE + G +G S 

Sbjct: 180 GCSGLTS I TFPNSLTAIGESAFYGCGALTS IT LPDALTTIGESAFK-GCSGLKSIT 234 

Query: 209 DIDQKEIIDEGEIFNAYQLTKLTIPNGYKSIGQDAFVDNKNIAEVNLPESLETISDYAFA 268 

+ IE ++ LT +T+P+ +IG+ AF + + P SL TI + AF 

Sbjct: 235 FPNSLTTIGESAFYDCGALTSITLPDALTTIGRSAFYGCSGLKSITFPNSLTTIGESAFY 294 

Query: 269 HM-SLKQVKLPDNLKVIGEIAFFDNQIGGKLYLPRHLIKLAERAFKSNRIQTVEFLGSKL 327 

+ SL + +P+++ IG AF+ + LP L + ERAF + + T + + + 

Sbjct: 295 NCGSLTSITIPNSVTTIGRSAFYGCSGLKSITLPDGLTTIEERAFYNCGVLTSITIPNSV 354 

Query: 328 KVIGFASFQD-NmRNVMLPDGLEKIESEAFTGNPGDEHYNNQVVLRTRTGQNPHQLATE 386 

IGE++F + L+++ LPDGL IE AF N L + T N E 

Sbjct: 355 ATIGESAFYGCSGLKSITLPDGLTTIEWGAFY NCGALTS ITI PNSVSTIGE 405 

Query: 387 NTYVNPDKSLWRATPDMDYTKWLEEDFTYQKNS VTGFS NKGLQKVRRNKMLE I PKQH 443 

+ + +L T D ++ D +++ +++G G + V K ++ K+ 

Sbjct: 406 SAFYGCG - ALKDVTVAWDTPIDI QRD - VFRELTLSGIRLHVPAGKKTVYEAK ~ - DVWKEF 461 

Query: 444 NGITITEIGDNAFRNVDFQSKTLRKYDLEEIKLPSTIRKIGAFAFQSNNLKSFEASEDLE 503 

N + + G + N D +KTL + P T + + FA ++ L 
Sbjct: 462 NIVEDDDFGGLQW-NYDAATKTLTITN PTPDTPKPMPNFATPNDQLW 507 

Query: 504 EIKEGAFMNNRIGTLDLKDKLIKIGDAAFH-INHIYAIVLPESVQEIGRSAFRQNGALHL 562 

GAF I + + D + +GD AF + + +1 LP+SV IG+SAF L 
Sbjct: 508 GAFQKE - IQKITIGDGVTSVGDFAFSGCDALKSITLPKSVTTIGQSAFSGCWDLRS 562 

Query: 563 MFIGNKVKTIGEMAFLSNKLESVNLSEQKQLKTIEVQAFS-DNALSEWLPPNLQTIREE 621 

+ + + V TIGE AF + LE +++ K + I + F +L+ + LP L I ++ 
Sbjct: 563 LTLPDGVNTIGEKAFY-DCLELTSITIPKSVTAIGQETFHYCVSLTSLTLPDALTAIGKK 621 

Query: 622 AF-KRNHLKEVKGSSTLSQITFNAFD 646 

AF N L V +++ I NAFD 
Sbjct: 622 AFYSCNALTSVTFPKSITTIGENAFD 647 
Identities = 109/407 (26%) , Positives = 175/407 (42%) , Gaps = 48/407 (11%) 

Query: 222 FNAYQLTKLTIPNGYKSIGQDAFVDNKNIAEVNLPESLETISDYAFAHMS-LKQVKLPDN 280 

F+ LT +T+PN +IG AF + + +P S+ TI ++AF S LK + LP++ 

Sbjct: 87 FSDCALTSVTLPNSLTAIGDHAFKGCSGLTSITIPNSVTTIGEWAFKGCSGLKSITLPNS 146 

Query: 281 LKVIGELAFFDNQIGGKLYLPRHLIKLAERAFKSNRIQTVEFLGSKLKVIGEASFQD-NN 339 

L IG+ A '++P++EAF T +L IGE++F 

Sbjct: 147 LTAIGQSALSGCTGLTSITIPNSVTTIGEWAFFGCSGLTSITFPNSLTAIGESAFYGCGA 206 

Query: 340 LRNVMLPDGLEKIESEAFTGNPGDEHYNNQVVBRTRTGQNPHQ1ATENTYVNPDKSLWRA 399 

L ++ LPD LI AF G G L++ T N E+ + + 

Sbjct: 207 LTSITLPDALTTIGESAFKGCSG LKSITFPNSLTTIGESAFYDCGALTSIT 257 

Query: 400 TPD^YTKWLEEDFTYQKNSVTGFSNKGLQKVRRNKNLEIPKQHNGITITEIGDNAFRNV 459 

PD ++T K++ P ++T IG++AF N 

Sbjct: 258 LPD ALTTIGRSAFYGCSGLKSITFPN SLTTIGESAFYNC 296 

Query: 460 DFQSKTLRKYDLEEIKLPSTIRKIGAFAFQS-NNLKSFEASEDLEEIKEGAFMNNRIGT- 517 

L I +P+++ IG AF + LKS + L I+E AF N + T 
Sbjct: 297 G SLTSITIPNSVTTIGRSAFYGCSGLKSITLPDGLTTIEERAFYNCGVLTS 347 

Query: 518 LDLKDKLIKIGDAAFH-INHIYAIVLPESVQEIGRSAFRQNGALHLMFIGNKVKTIGEMA 576 

+ + + + IG++AF+ + + +1 LP+ + I AF GAL + I N V TIGE A 
Sbjct: 348 ITIPNSVATIGESAFYGCSGLKSITLPDGLTTIEWGAFYNCGALTSITIPNSVSTIGESA 407 
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Query: 577 FLS-NKLESVNLSEQKQLKTIEVQAFSDMALSEWL--PPMLQTIRE 620 

F L+ V ++ +1+ F + LS + L P +T+ E 

Sbjct: 408 FYGCGALKDVTVAWDTPI -DIQRDVFRELTLSGIRLHVPAGKKTVYE 453 
Identities = 111/465 (23%) , Positives = 185/465 (38%) , Gaps = 56/465 (12%) 

Query: 141 VTRGDTLVGFSKSGINKLSQTSHLVLPSHAADGTQLTQVASFAF TPDKKT 190 

+T D L +S S + P+ LT + AF PD T 

Sbjct: 210 ITLPDALTTIGESAFKGCSGLKS ITFPN -SLTTIGESAFYDCGALTSITLPDALT 263 

Query: 191 AIAEYTSRLGENGKPSRLDIDQKEIIDEGEIFNAYQLTKLTIPNGYKSIGQDAFVDNKNI 250 

I ++ G +G S + I E +N LT +TIPN +IG+ AF + 
Sbjct: 264 TIGR-SAFYGCSGLKBITFPNSLTTIGESAFYNCGSLTSITIPNSVTTIGRSAFYGCSGL 322 

Query: 251 AE VNLPESLETISDYAFAHMS - LKQVKLPDNLKVIGELAFFDNQIGGKLYLPRHLI KIAE 309 

+ LP+ L TI + AF + L + +P+++ IGE AF+ + LP L + 

Sbjct: 323 KSITLPDGLTTIEERAFYNCGVLTSITIPNSVATIGESAFYGCSGLKSITLPDGLTTIEW 382 

Query: 310 RAFKSNRI QTVEFLGS KLKVTGEAS FQD - NNLRNVMLP - DGLEKI ESEAF TGNPG 362 

AF + T + + + IGE++F L++V + D 1+ + F +G 

Sbjct: 383 GAFYNCGALTSITIPNSVSTIGESAFYGCGALKDVTVAWDTPIDIQRDVFRELTLSGIRL 442 

Query: 363 DEHYNNQWLRTRTGQNPHQLATEN TYVNPDKSLWRATPDMDYTKWLEEDFTY 415 

+ V + + ++ Y K+L P D K + +F 

Sbjct: 443 HVPAGKKTVYEAKDVWKEFNIVEDDDFGGLQWNYDAATKTLTITNPTPDTPKPM-PNFAT 501 

Query: 416 QKNSOTGFSNKGLQKVRRNKNLEIPKQIiNGITITEIGDNAFRNVDFQSKTLRKYDLEEIK 475 

+ + G K +QK+ G +T +GD AF D L+ I 

Sbjct: 502 PNDQLWGAFQKEIQKIT IGDGVTSVGDFAFSGCD ALKSIT 541 

Query: 476 LPSTIRKIGAFAFQSN-NLKSFEASEDLEEIKEGAFMN-NRIGTLDLKDKLIKIGDAAFH 533 

LP ++ IG AF +L+S + + IE AF + + ++ + + IG FH 
Sbjct: 542 LPKSVTTIGQSAFSGCWDLRSLTLPDGVNTIGEKAFYDCLELTSITIPKSVTAIGQETFH 601 

Query: 534 - INHIYAIVLPESVQEIGRSAFRQNGALHLMFIGNKVKTIGEMAF 577 

+ ++ LP+++ IG+ AF AL + + TIGE AF 

Sbjct: 602 YCVSLTSLTLPDALTAIGKKAFYSCNALTSVTFPKSITTIGENAF 646 
Identities = 98/351 (27%) , Positives = 152/351 (42%) ( Gaps = 53/351 (15%) 

Query: 315 NRIQTVEFLGSKLKVIGFASFQDNl^RNVMLPDGLEKIESEAFTGNPGDEHYNNQvvIjRT 374 

++IQTV +G + +G +F D L +V LP+ L I AF G G L + 

Sbjct: 68 SKIQTVT-IGDGVTSVGNNAFSDCALTSVTLPNSLTAIGDHAFKGCSG LTS 117 

Query: 375 RTGQNPHQLATENTYVNPDKSLWRATPDMDYTKWLEEDFTYQKNSVTGFSNKGLQKVRRN 434 

T P+ + T + S ++ NS+T L 

Sbjct: 118 IT- - IPNSVTTIGEWAFKGCSGLKS IT LPNSLTAIGQSALSGCTGL 161 

Query: 435 KNLEIPKQHNGITITEIGDNAF RNVDFQSKTLRKYD LEEIKLPSTI 480 

++ IP ++T IG+ AF ++ F + + L I LP + 

Sbjct: 162 TSITIPN SVTTIGEWAFFGCSGLTSITFPNSLTAIGESAFYGCGALTSITLPDAL 216 

Query: 481 RKIGAFAFQS - NNLKSFEASEDLEE I KEGAFMN-NRIGTLDLKDKLI KIGDAAFH - INHI 537 

IG AF+ + LKS L IEAF+ +++LDL IG +AF+ + + 

Sbjct: 217 TTIGESAFKGCSGLKSITFPNSLTTIGESAFYDCGALTSITLPDALTTIGRSAFYGCSGL 276 

Query: 538 YAIVLPESVQEIGRSAFRQNGALHLMFIGNKVKTIGEMAFLS-NKLESvNLSEQKQLKTI 596 

+1 P S+ IG SAF G+L + I N V TIG AF + L+S+ L + L TI 
Sbjct: 277 KSITFPNSLTTIGESAFYNCGSLTSITIPNSVTTIGRSAFYGCSGLKSITLPD--GLTTI 334 

Query: 597 EVQAFSD-WALSEVVLPPNLQTIREEAFKR-NHLKEVKGSSTLSQITFNAF 645 

E +AF + L+ + +P ++ TI E AF + LK + L+ I + AF 

Sbjct: 335 EERAFYNCGVLTSITIPNSVATIGESAFYGCSGLKSITLPDGLTTIEWGAF 385 
Identities = 78/282 (27%) , Positives = 123/282 (42%) , Gaps = 46/282 (16%) 

Query: 111 ISTOASKVEVPKQDTASKKETLETSTWEAKDFVTRGDTLVGFSKSGINKLSQTSHL VLPS - - 168 

N+AS E+P SK +T VT GD + + + + TS + LP+ 

Sbjct: 56 ETNAS- -EIPWHSLQSKIQT VTIGDGVTSVGNNAFSDCALTS - VTLPNSL 101 



Query: 169 HAADG 



TQLTQVASFAFT 



PDKKTAIAEYTSRLGENG 203 
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HA G +T + +AF P+ TAI + ++ G G 

Sbjct: 102 TAIGDHAFKGCSGLTSITIPNSVTTIGEWAFKGCSGLKSITLPNSLTAIGQ-SALSGCTG 160 

Query: 204 KPSRLDIDQKEIIDEGEIFNAYQLTKLTIPNGYKSIGQDAFVDNKNIAEVNLPESLETIS 263 
5 S + I E F LT +T PN +IG+ AF + + LP++L TI 

Sbjct: 161 LTSITIPNSVTTIGEWAFFGCSGLTSITFPNSLTAIGESAFYGCGALTSITLPDALTTIG 220 

Query: 264 DYAFAHMS - LKQVKLPDNLKVIGELAFFDNQIGGKLYLPRHLIKLAERAFKS -NRIQTVE 321 
+ AF S LK + P++L IGE AF+D + LP L + AF + ++++ 

10 Sbjct: 221 ESAFKGCSGLKSITFPNSLTTIGESAFYDCGALTSITLPDALTTIGRSAFYGCSGLKSIT 280 

Query: 322 FLGSKLKVIGEASFQD-NNLRNVMLPDGLEKIESEAFTGNPG 362 

F S3j IGE++F + +L ++ +P+ +1 AF G G 
Sbjct: 281 FPNS - LTTIGESAFYNCGSLTSITI PNSVTTIGRSAFYGCSG 321 
15 Identities = 43/144 (29%) , Positives = 70/144 (47%) , Gaps = 4/144 (2%) 

Query: 220 EIFNAYQ--LTKLTIPNGYKSIGQDAFVDNKNIAEVNLPESLETISDYAFAHM-SLKQVK 276 

+++ A+Q + K+TI +G S+G AF + + LP+S+ TI AF+ L+ + 

Sbjct: 505 QLWGAFQKEIQKITIGDGVTSVGDFAFSGCDALKSITLPKSVTTIGQSAFSGCWDLRSLT 564 

20 

Query: 277 LPDNLKVIGELAFFDNQIGGKLYLPRHLIKLAERAFKSmiQTVEFLGSKLKVIGEASFQ 336 

LPD + IGE AF+D + +P+ + + + F T L L IG+ +F 

Sbjct: 565 LPDGVNTIGEKAFYDCLELTSITIPKSVTAIGQETFHYCTSLTSLTLPDALTAIGKKAFY 624 

25 Query: 337 D-NNLRNVMLPDGLEKIESEAFTG 359 

N L +V P + I AF G 
Sbjct: 625 SCNALTSVTFPKSITTIGENAFDG 648 
Identities = 43/134 (32%) , Positives = 66/134 (49%) , Gaps = 12/134 (8%) 

30 Query: 511 MNNRIGTLDLKDKLIKIGDAAFHINHIYAIVLPESVQEIGRSAFRQNGALHLMFIGNKVK 570 

+ ++I T+ + D + +G+ AF + ++ LP S+ IG AF+ L + I N V 
Sbjct: 66 LQSKIQTVTIGDGVTSVGNNAFSDCALTSVTLPNSLTAIGDHAFRGCSGLTSITIPNSvT 125 

Query: 571 TIGEMAFLS-NKLESVIttSEQKQLKTIEVQAFSD-NALSEvvLPPNLOTIREEAFKRNHL 628 
35 TIGE AF + L+S+ L LI AS L+ + +P ++ TI E AF 

Sbjct: 126 TIGEWAFKGCSGLKSITL- -PNSLTAIGQSALSGCTGLTSITIPNSVTTIGEWAF 178 

Query: 629 KEVKGSSTLSQITF 642 
G S L+ ITF 
40 Sbjct: 179 FGCSGLTS ITF 189 

A related DNA sequence was identified in S. pyogenes <SEQ ID 80 1> which encodes the amino acid 
sequence <SEQ ID 802>. Analysis of this protein sequence reveals the following: 

45 Possible site: 21 

>>> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -2.44 Transmembrane 984 -1000 ( 984 -1001) 

Final Results 

50 bacterial membrane Certainty=0 . 1977 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



55 



LPXTG motif: 975-979 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 751/1050 (71%) , Positives = 861/1050 (81%) , Gaps = 45/1050 (4%) 

Query: 3 KKHLKTLAIALTTVSvVrYSQEVYGLEREESVKQEQTQSA-SEDDWFEEDNERKTNVSKE 61 
60 KKHLKT+AL LTTVSWT+ +QEV+ L +E +KQ Q S+ S D+ E + K +++ 

Sbjct: 2 KKHLKWALTLTWSVVTHNQEVFSLvKEPILRQTQASSSISGADYAESSGKSKLKINET 61 

Query: 62 NSTVDETVSDLFSDGNSNNSSSKTESWSDPRQVPKAKPEVTQEASNSSNDASKVEVPKQ 121 
+ VD+TV+DLFSD + K +Q KA E T E+ S++E K+ 

65 Sbjct: 62 SGPVDDTVTDLFSDKRTTPEKIKDNLARGPREQELKAVTENT-ESEKQITSGSQLEQSKE 120 
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Query: 122 DTASKKETLETSTWEAKDFVTRGDTLVGFSKSGINKLSQTSHLVLPSHAADGTQLTQVAS 181 

+ K TS WE DF+T+G+TLVG SKSG+ KLSQT HLVLPS AADGTQL QVAS 
Sbjct: 121 SLSLNKTVPSTSNWEICDFITKGNTLVGLSKSGVEKDSQTDHLVLPSQAADGTQLIQVAS 180 

5 

Query: 182 FAFTPDKKTAIAEYTSRLGENGKPSRLDIDQKEI IDEGEI FNAYQLTKLTI PNGYKS IGQ 241 

FAFTPDKKTAIAEYTSR GENG+ S+LD+D KEII+EGE+FN+Y L K+TIP GYK IGQ 
Sbjct: 181 FAFTPDKKTAIAEYTSRAGENGEISQIjDVDGKEIINEGEVFNSYLLKKVTIPTGYKHIGQ 240 

10 Query: 242 DAFVDNKN1AEVNLPESLETISDYAFAHMSLKQVKLPDNLKVIGELAFFDNQIGGKLYLP 301 

DAFVDNKNIAEVNLPESLETISDYAFAH++LKQ+ LPDNLK IGELAFFDNQI GKL LP 
Sbjct: 241 DAFVDNKNIAEVNLPESLET I SDYAFAHLALKQIDLPDNLKAIGELAFFDNQITGKLSLP 300 

Query: 302 RHLIKLAERAFKSNRIQTWFLGSKLKVIGFASFQDM^LRNVMLPDGLEKIESEAFTGNP 361 
15 R L++LAERAFKSN I+T+EF G+ LKVIGEASFQDN+L +MLPDGLEKI ESEAFTGNP 

Sbjct: 301 RQLMRLAERAFKSNHIKTIEFRGNSLKVIGEASFQDNDLSQLMLPDGLEKIESEAFTGNP 360 

Query: 362 GDEHYMNQVVLRTRTGQNPHQIATEJNrTYVNPDKSLWRATPDMDYTKWLEEDFTYQKNSVT 421 
GD+HYNN+VVL T++G+NP IATENTYVNPDKSLW+ +P++DYTKWLEEDFTYQKNSVT 
20 Sbjct: 361 GDDHYNNRWLWTKSGKNPSGLATENTYVNPDKSLWQES PE I DYTKWLEEDFTYQKNSVT 420 

Query: 422 GFSNKGLQKVRRNKNLEIPKQHNGITITEIGDNAFRNVDFQSKTLRKYDLEEIKLPSTIR 481 

GFSNKGLQKV+RNKNLEIPKQHNG+TITEIGDNAFRNVDFQ+KTLRKYDLEE+KLPSTIR 
Sbjct: 421 GFSNKGLQKOTKNKHLEIPKQHNGVTITEIGDNAFRNVDFQNKTLRKYDLEEVKLPSTIR 480 

25 

Query: 482 KIGAFAFQSNNLKSFEASEDLEEIKEGAFMNNRIGTLDLKDKLIKIGDAAFHINHIYAIV 541 

KIGAFAFQSNNLKS FEAS+DLEE I KEGAFMNNRI TL+LKDKL+ IGDAAFHINH1YAIV 
Sbjct: 481 KIGAFAFQSNNLKS FFASDDLEEIKEGAFMNNRIETLELKDKLVTIGDAAFHINHIYAIV 540 

30 Query: 542 LPESVQEIGRSAFRQNGALHLMFIGNKVKTIGEMAFLSNKLESVNLSEQKQLKTIEVQAF 601 

LPESVQEIGRSAFRQNGA +L+F+G+KVKT+GEMAFLSN+LE ++LSEQKQL I VQAF 
Sbjct: 541 LPESVQEIGRSAFRQNGANNLIFMGSKVKTLGEMAFLSNRLEHLDLSEQKQLTEIPVQAF 600 

Query: 602 SDNALSEVVLPPNLQTIREEAFKRNHLKEWGSSTLSQITFNAFDQMDGDKRFGKKVVVR 661 
35 SDNAL EV+LP +L+TIREEAFK+NHLK+++ +S LS I FNA D NDGD++F KVW+ 

Sbjct: 601 SDNALKEVLLPASLKTIREEAFKKNHLKQLEVASALSHIAFNALDDNDGDEQFDNKVVVK 660 

Query: 662 THNNSHMIADGERFIIDPDKLSSTMVDLEKVLKIIEGLDYSTLRQTTQTQFREMTTAGKA 721 
TH+NS+ LADGE FI+DPDKLSST+VDLEK+LK+IEGLDYSTLRQTTQTQFR+MTTAGKA 
40 Sbjct: 661 THHNSYALADGEHFIVDPDKLSSTIVDLEKILKLIEGLDYSTLRQTTQTQFRDMTTAGKA 720 

Query: 722 LLSKSNLRQGEKQKFLQEAQFFLGRVDLDKAIAKAEKALVTKKATKNGHLLERSINKAVL 781 

LLSKSNLRQGEKQKFLQEAQFFLGRVDLDKAIAKAEKALVTKKATKNG LLERS INKAVL 
Sbjct- 721 LLSKSNLRQGEKQKFLQEAQFFLGRVDLDKAIAKAEKALVTKKATKNGQLLERS INKAVL 780 

45 

Query: 782 AYNNSAIKKANVKRLEKELDLLTDLVEGKGPLAQATMVQGVYLLKTPLPLPEYYIGIJSIVY 841 

AYNNSAI KKANVKRLEKELDLLT LVEGKGPLAQATMVQGVYLLKTPLPLPEYYIGLNVY 
Sbjct: 781 AYNNSAIKKANVKRLEKELDLLTGLVEGKGPIAQATWQGWLLKTPLPLPEYYIGLNVY 840 

50 Query: 842 FDKSGKLIYALDMSDTIGEGQKDAYGNPILNVDEDNEGYHTIAVATIjADYEGLYIKDILN 901 

FDKSGKLIYALDMSDTIGEGQKDAYGNPILNVDEDNEGYH LAVATLADYEGL IK ILN 
Sbjct: 841 FDKSGKLIYALDMSDTIGEGQKIJAYGNPILNVDEDNEGYHAIjAVATLADYEGLDIKTILN 900 

Query: 902 SSLDKIKAIRQI PLAKYHRLGI FQAIRNAAAEADRLLPKTPKGYLNEVPNYRKKQVEKNL 961 
55 S L ++ +IRQ+P A YHR GIFQAI+NAAAEA++LLPK 

Sbjct: 901 SKLSQLTSIRQVPTAAYHRAGIFQAIQNAAAEAEQLLPK 939 

Query: 962 KPVDYKTPIFNKALPNEKVDGDRAAKGHNINAETNNSVAVTPIRSEQQLHKSQSDVNLPQ 1021 

++++ + N++ ++S + ++ + LP+ 

60 Sbjct: 940 PGTHSEKSSSSESANSKDRG LQSNPKTNRGRHSAILPR 977 

Query: 1022 TSSKNNFIYEILGYVSLCLLFLVTAGKKGK 1051 

T SK +F+Y ILGY S+ LL L+TA KK K 
Sbjct: 978 TGSKGSFVYGILGYTSVALLSLITAIKKKK 1007 



65 



SEQ ID 800 (GBS97) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 17 (lane 12; MW 113.4kDa). 
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GBS97-His was purified as shown in Figure 193, lane 6. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 254 

A DNA sequence (GBSx0269) was identified in S.agalactiae <SEQ ID 803> which encodes the amino acid 
sequence <SEQ ID 804>. This protein is predicted to be ribonucleoside-diphosphate reductase alpha chain 
(nrdE). Analysis of this protein sequence reveals the following: 

Possible site: 48 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0 . 4274 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAB96160 GB:AE000050 ribonucleoside-diphosphate reductase alpha 
chain~MPN324 (new) , 513 (Himmelreich et al . , 1996) 
[Mycoplasma pneumoniae] 
Identities = 476/725 (65%) , Positives = 586/725 (80%) , Gaps = 20/725 (2%) 

Query: 2 TQSD- -AYLSnNAKTRFRDRTGNYHFTSDKEAVEQYMIEHVEPNTMVFTSLIEKLDYLVS 59 

TQ D +Y+SLNA T+ F D AVE Y+ EHV+P T VF S E+LD+LV 

Sbjct: 12 TQEDLESYISLNAYTKVYG DFKMDLHAVEAYIQEHvKPKTKVFHSTKERLDFLVK 66 

Query: 60 NNYYESDLLKQYNLEFICQIFEHAYAKKFAFIJIFMGALKFYNAYALKTEDNRYYLEHYED 119 

N+YY+ +++ Y+ E +1 AYA +F + NFMGA KFYNAYALKT D ++YLE+YED 
Sbjct: 67 NDYYDENIIN^SFEQFEEITRKAYAYRFRYANFMGAFKFYNAYALKTFDGKWYLENYED 126 

Query: 120 RVVMNALFIAAGDEKAAYDLVDDMLANRFQPATPTFLNAGKKRRGEYISCYLLRIEDNME 179 

RWMN LFLA G+ A L+ ++ NRFQPATPTFLNAG+K+RGE++SCYLLRIEDNME 
Sbjct: 127 RWMNVLFLANGNYNKALKLLKQIITNRFQPATPTFI^ 186 

Query: 180 SISRAISTSLQLSKRGGGVALCLTNLREFGAPIKGIKNQATGIVPVMKLLEDSFSYANQL 239 

SI RAI+T+LQLSKR GGVAL LTN+RE GAPIK I+NQ++GI+P+MKLLEDSFSYANQL 
Sbjct: 187 S IGRAITTTLQLSKRDGGVALLLTNIRESGAPIKKIENQSSGI I PIMKLLEDSFSYANQL 246 

Query: 240 GQRQGAGAVYLHAHHPEVLTFLDTKRENADEKIRIKSLSLGLVIPDITFELAKANKDMAL 299 

GQRQGAGAVYLHAHHP+V+ FLDTKRENADEKI RI KSLSLGLVI PDI TF LAK N++MAL 
Sbjct: 247 GQRQGAGAWLHAHHPDVMQFLDTKRENADEKIRI KSLSLGLVI PDITFTLAKNNEEMAL 306 

Query: 300 FSPYDIERVYGKPMSDISITEEYETLLANADIRKTFISARKLFQTIAELHFESGYPYILF 359 

FSPYD+ YGKP+SDIS+TE Y LLAN I+KTFI+ARK FQT+AELHFESGYPYILF 
Sbjct: 307 FSPYDVYEEYGKPLSDISvTEMYYELLANQRIKKTFINARKFFQTVAELHFESGYPYILF 366 

Query: 360 EDTVNAKNPHKKEGRIWSNLCSEIAQVOTASQFSEDLTFTKVGHDVCCNLGSINIARAM 419 

+DTVN +N H RIVMSNLCSEI Q +T S+F DL F KVG+D+ CNLGS+NIA+AM 
Sbjct: 367 DD1VNRRNAH--PNRIVMSNLCSEIVQPSTPSEFHHDLAFKKVGNDISCNLGSLNIAKAM 424 

Query: 420 DQAADFEKL IANS IRALDRVSRTSDLDSAPS IKKGNAANHAVGLGAMNLHGFLATNHIYY 479 
' + +F +L+ +1 +LD VSR S+L++APSI+KGN+ NHA+GLGAMNLHGFLATN IYY 

Sbjct: 425 ESGPEFSELVKIAIESLDLVSRVSNLETAPSIQKGNSENHALGLGAMNLHGFLATNQIYY 484 

Query: 480 DSQEAIDFTDCFFYAMAYYAFKASNHLAKEKGTFEGFSESSYADGSYFYQY--TEQNF-E 536 

+s EAIDFT+ FFY +AY+AFKAS+ LA EKG F+ F + +ADGSYF +Y E +F 
Sbjct: 485 NSPEAIDFTNIFFYTVAYHAFKASSELMiEKGKFKNFEOTKFADGSYFDKYIKVEPDFWT 544 

Query: 537 PKTQRVTCNLLAEYGLTLPSQEDWRKLVQSIKEIGIjANAHLIAVAPTGSISYLSSCTPSLQ 596 

PKT+RVK L +Y + +P++E+W++L +I++ GIjAN+HLLA+APTGSISYLSSCTPSLQ 
Sbjct: 545 PKTERVKALFQKYQVEIPTRENWKELAMIQKNGLANSHLLAIAPTGSISYLSSCTPSLQ 604 
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Query: 597 PWSPVEVRKEGALGRVYVPAYKIDADNYVYYKKGAYEVGSEAIINIAAAAQKHIDQAIS 656 

PWSPVEVRKEG LGR+YVPAY+++ D+Y +YK GAYE+G E I INIAAAAQ+H+DQAIS 
Sbjct: 605 PWSPVEVRKEGRLGRIYVPAYQLNKDSYPFYKDGAYELGPEPIINIAAAAQQHVDQAIS 664 

Query: 657 LTLFMTDQATTRDIiNKAYIQAFKQKCASIYYVRVRQDILEGSESYDDMLDDFTSSDLEDC 716 

LTLFMTD+ATTRDLNKAYI AFK+ C+SIYYVRVRQ++LE SE + + +4- C 

Sbjct: 665 LTLFMTDKATTRDLNKAYIYAFKKGCSSIYYVRWQEVLEDSEDH TIQMQQC 716 

Query: 717 QSCMI 721 
++C+I 

Sbjct: 717 EACVI 721 

A related DNA sequence was identified in S.pyogenes <SEQ ID 805> which encodes the amino acid 
sequence <SEQ ID 806>. Analysis of this protein sequence reveals the following: 

Possible site: 47 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1843 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAC82625 GB:AF054892 surface antigen BspA [Bacteroides forsythus] 
Identities = 124/451 (27%), Positives = 202/451 (44%), Gaps = 65/451 (14%) 

Query: 221 FNSYLLKKvTIPTGYKHIGQDAFVDNKNIAEVNLPESLETISDYAFAHIA-LKQIDLPDN 279 

F+ L VT+P IG AF + + +P S+ TI ++AF + LK I LP++ 



Sb j Ct : 


87 


FSDCALTSVTLPNSLTAIGDHAFKGCSGLTSITIPNSVTTIGEWAFKGCSGLKSITLPNS 


146 


Query: 


280 


LKAIGELAFFDNQITGKLSLPRQLMRLAERA-FKSNHIRTIEFRGNSLKVIGEASFQD-N 


337 






L AIG+ A +++P + + EAF + + +IF NSL IGE++F 




Sbj ct : 


147 


LTAIGQSALSGCTGLTSITIPNSVTTIGEWAFFGCSGLTSITF-PNSLTAIGESAFYGCG 


205 


Query: 


338 


DLSQLMLPDGLEKIESEAFTGNPGDDHYNNRVVLVJTKSGKNPSGLATENTYVNPDKSLWQ 


397 






L+ + LPD L I AF G G KS P+ L T +S + 




Sbj ct : 


206 


ALTS ITLPDALTTIGESAFKGCSG LKSITFPNSLTTIG ESAFY 


248 


Query: 


398 


ESPEIDYTKWLEEDFTYQKNSVTGFSNKGLQKVKRNKNLEIPKQHNGVTITEIGDNAFRN 


457 






+ + + T +++ G S GL K++ P ++T IG++AF N 




Sbj ct : 


249 


DCGALTSITLPDALTTIGRSAFYGCS--GL KSITFPN SLTTIGESAFYN 


295 


Query: 


458 


VDFQNKTLRKYDLEEVKLPSTIRKIGAFAFQS-NNLKSFEASDDLEEIKEGAFMNNRIET 


516 






L + +P+++ IG AF + LKS D L I+E AF N + T 




Sbj ct : 


296 


CG SLTSITIPNSVTTIGRSAFYGCSGLKSITLPDGLTTIEERAFYNCGVLT 


346 


Query: 


517 


-LELKDKLVTIGDAAFH-INHIYAIVLPESVQEIGRSAFRQNGANNLIFMGSKVKTLGEM 


574 






+ + + + TIG++AF+ + + +1 LP+ +1 AF GA I + +V T+GE 




Sbj ct : 


347 


SITIPNSVATIGESAFYGCSGLKSITIiPDGLTTIEWGAFYNCGALTSITIPNSVSTIGES 


406 


Query: 


575 


AFLS-NRLEHLDLSEQKQLTEIPVQAFSDNALKEVLL--PASLKTIREEAFKKNHLKQLE 


631 






AF L+ + ++ + +1 F + L + L PA KT+ E K+ K+ 




Sbj ct : 


407 


AFYGCGALKDVTVAWDTPI -DIQRDVFRELTLSGIRLHVPAGKKTVYE- - -AKDVWKE- - 


460 


Query: 


632 


VASALSHIAFNALDDND-GDEQFDNKWVKT 661 








FN ++D+D G Q++ KT 




Sbjct: 


461 


FNIVEDDDFGGLQWNYDAATKT 482 





An alignment of the GAS and GBS proteins is shown below: 

Identities = 534/726 (73%) , Positives = 614/726 (84%) , Gaps = 5/726 (0%) 
Query: 1 MTQSDA-YLSLNAKTRFRDRTGNYHFTSDKEAVEQYMIEHVEPNTMVFTSLIEKLDYLVS 59 
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M+Q++A YLSLNA TRF+ G+YHF SDKEAV +Y+ EHV PN M F SL +KL YL++ 
Sbjct: 1 MSQTNASYLSLNALTRFKKPDGSYHFDSDKFJWRRYLEEHVSPNQMAFNSLEDKLAYLIN 60 

Query: 60 IMYYESDLLKQYNLEFICQIFEHAYAKKFAFIjNFMGSUjKFYNAYALKTEDNRYYLEHYED 119 
5 YYE + Y + I + F +AY + + FLN MGA+KFY +YALKT D + YLE +ED 

Sbjct: 61 EGYYEQAIFDAYPNDLIKEAFHYAYQQGYRFLNLMGAMKFYQSYALKTLDGKQYLETFED 120 

Query: 120 RVVMNALFIAAGDEKAAYDLVDDMiyiNRFQPATPTFMAGKKRRGEYISCYLLRIEDNME 179 
R VMNALFLA GD+ +D++D +L RFQPATPTFIMAGKKRRGEYISCYLLR+EDNME 
10 Sbjct: 121 RAVMNALFLADGDQTFVFDVIDAILHRRFQPATPTFLNAGKKRRGEYISCYLLRVEDNME 180 

Query: 180 SISRAISTSLQLSKRGGGVALCLTNLREFGAPIKGIKNQATGIVPVMKLLEDSFSYANQL 239 

SISRAISTSLQLSKRGGGVALCLTNLRE GAPIKGI+NQATGIVPVMKLLEDSFSYANQL 
Sbjct: 181 SISRAISTSLQLSKRGGGVALCLTNLREIGAPIKGIENQATGIVPVMKLLEDSFSYANQL 240 

15 

Query: 240 GQRQGAGAVYLHAHHPEVLTFLDTKRENADEKIRIKSLSLGLVIPDITFELAKANKDMAL 299 

GQRQGAGAVYLHAHHPEVLTFLDTKRENADEKIRIKSL+LGLVIPDITF+LAK NKDMAL 
Sbjct: 241 GQRQGAGAVYLHAHHPEVLTFLDTKRENADEKIRIKSLALGLVIPDITFQLAKENKDMAL 300 

20 Query: 300 FSPYD1ERVYGKPMSDISITEEYETLLANADIRKTFISARKLFQTIAELHFESGYPYILF 359 • 

FSPYDI+R YGK MSDISITEEY+ LLAN I+KT+ISARK FQ IAELHFESGYPY+LF 
Sbjct: 301 FSPYDIKRAYGKDMSDISITEEYDKLLANPAIKKTYISARKFFQLIAELHFESGYPYLLF 360 

Query: 360 EDTVNAKNPHKKEGRIVMSNLCSEIAQVNTASQFSEDLTFTKVGHDVCCNLGSINIARAM 419 
25 +DTVN +NPH K+GRIVMSNLCSEIAQV+T S F EDL+F +G D+ CCNLGS INIA+AM 

Sbjct: 361 DDTVNKRNPHAKKGRIVMSNLCSEIAQVSTPSTFKEDLSFETIGEDICCNLGSINIAQAM 420 

Query: 420 DQAADFEKLIANSIRALDRVSRTSDLDSAPSIKKGNAAimAVGLGAMNLHGFLATNHIYY 479 
A FE+LI SIRALDRVSR SDL+ APS++ GNAANHAVGLGAMNLHGFLATNHIYY 
30 Sbjct: 421 ADAPHFEQLITTSIRALDRVSRVSDLN^PSVETGNAaNHAVGLGAMNLHGFIATNHIYY 480 

Query: 480 DSQEAIDFTDCFFYAMAYYAFKASNHIAKEKGTFEGFSESSYADGSYFYQYTEQNFEPKT 539 

D++EA+DFTD FF+AMAYYAFKAS LAKEKG F GFS S+Y+DG+YF +Y +++ +P+T 
Sbjct: 481 DTKFAVDFTDLFFHA^YYAFKASCQLAKEKGAFAGFSLSTYSDGTYFAKYLQEDAKPQT 540 

35 

Query: 540 QRVKNLLAEYGLTLPSQEDTOKLVQSIKEIGLANAHLLAVAPTGSISYLSSCTPSLQPW 599 

+V LL +YG TLP+ DW+ LV IK+ GLANAHLLAVAPTGSISYLSSCTPSLQPW 
Sbjct: 541 AKVATLLQDYGFTLPWADWQALVADIKQFGLANAHLLAVAPTGSISYLSSCTPSLQPVV 600 

40 Query: 600 SPVEVRKEGALGRVYVPAYKIDADNYVYYKKGAYEVGSEAIINIAAAAQKHIDQAISLTL 659 

+PVEVRKEG+LGR+YVPAY+ID NY YY++GAYEVG +AII++ AAAQKH+DQAI SLTL 
Sbjct: 601 APVEVRKEGSLGRIYVPAYQIDQANYAYYERGAYEVGPKAI IDWAAAQKHVDQAI SLTL 660 

Query: 660 FMTDQATTRDLNKAYIQAFKQKCASIYYVRVRQDILEGSESYDD MLDDFTSSDLED 715 

45 FMTDQATTRDLN++YIQAFKQ CAS I YYVRVRQD+L GSE YD+ + + 

Sbjct: 661 FMTDQATTRDLNRSYIQAFKQNCASIYYVRVRQDVLAGSEQYDEDSLVTAPGASDETTTE 720 

Query: 716 CQSCMI 721 
CQSCMI 

50 Sbjct: 721 CQSCMI 726 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 255 

55 A DNA sequence (GBSx0270) was identified in S.agalactiae <SEQ ID 807> which encodes the amino acid 
sequence <SEQ ID 808>. This protein is predicted to be nrdl protein (nrdl). Analysis of this protein 
sequence reveals the following: 



60 



Possible site: 54 

»> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty^O . 2952 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 
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bacterial outside --- Certainty=0. 0000 (Not Clear) < suco> 

The protein has homology with the following sequences in the GENPEPT database: 

>GP.-AAC71451 GB:U39702 nrdl protein (nrdl) [Mycoplasma genitalium] 
5 Identities = 77/127 (60%) , Positives = 104/127 (81%) , Gaps = 1/127 (0%) 

Query: 7 WYFSSKSNNTHRFVQKLACSNQRIPSD-GSSILVTEDYILIVPTYAGGGDDTKGAVPKQ 65 

+VYFSS SNNTHRF++KL ++RIP D SI V+ +Y+LI PTY+GGG+ +GAVPKQ 
Sbjct: 22 IWFSSISNOTHRFIEKLGFQHKRIPVDITQSITVSNEYVLICPTYSGGGNQVEGAVPKQ 81 

10 

Query: 66 WQFLNVRQISREHCQGVISSGNlWFGDTYAIAGPIIARKIiHVPLLHQFELLGTQEDVTRV 125 

V+QFLN + NRE C+GVI +SGNTNFGDT+ +AG +I++KLNVPLL+QFELLGT+ DV + 
Sbjct: 82 VIQFLNNKHKRELCRGVIASGNTNFGDTFCIAGTVISKK^ 141 

15 Query: 126 KELLCQF 132 

++++ F 
Sbjct: 142 QKIIANF 148 

A related DNA sequence was identified in S.pyogenes <SEQ ID 809> which encodes the amino acid 
20 sequence <SEQ ID 810>. Analysis of this protein sequence reveals the following: 

Possible site: 54 

»> Seems to have no N-terminal signal sequence 

Final Results 

25 bacterial cytoplasm Certainty=0. 0089 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

30 Identities = 84/125 (67%) , Positives = 100/125 (79%) 

Query: 7 WYFSSKSNNTHRFVQKLACSNQRIPSDGSSILVTEDYILIVPTYAGGGDDTKGAVPKQV 66 

+VYFSSKSNNTHRFVQKL QRIP D + V+ Y+LIVPTYA GG D KGAV KQV 
Sbjct: 6 IVYFSSKSNNTHRFVQKLGLPAQRIPVDNRPLEVSTHYIjLIVPTYAAGGSDAKGAVSKQV 65 

35 

Query: 67 VQFLNVRQNREHCQGVISSGNTNFGDTYAIAGPIIARKI^OTPLLHQFELLGTQEDVTRVK 126 

++FLN NR+HC+GVISSGNTNFGDT+A+AGPII++KL VPLLHQFELLGT DV +V+ 
Sbjct: 66 IRFLNNPNNRKHCKGVISSGNTNFGDTFALAGPIISQKLQVPLLHQFELLGTATDVKKVQ 125 

40 Query: 127 ELLCQ 131 

+ + 

Sbjct: 126 AIFAR 130 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
45 vaccines or diagnostics. 

Example 256 

A DNA sequence (GBSx0271) was identified in S.agalactiae <SEQ ID 81 1> which encodes the amino acid 
sequence <SEQ ID 812>. This protein is predicted to be ribonucleoside-diphosphate reductase beta chain 
(nrdF). Analysis of this protein sequence reveals the following: 

50 Possible site: 27 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3889 (Affirmative) < suco 

55 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 
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>GP:AAB96162 GB:AE000050 ribonucleoside-diphosphate reductase beta 
chain [Mycoplasma pneumoniae] 
Identities = 261/335 (77%) , Positives = 301/335 (88%) 

5 Query: 2 QSYYDRSQSPLDYALSEKAFPMRSVNWNKLNDDKDLEVTHNRVT^ 61 

+ Y+ S SPL+YA + +RSVNWN ++D+KDLEVWNR+TQNFWLPEKIPVSND+ S 
Sbjct: 5 KKYFLESVSPLEYAQKKFQGNLRSVNWNLVDDEKDLEvWNRITQNFWLPEKIPVSNDIPS 64 

Query: 62 WRTLDADWQQLITRTFTGLTLLDSVQATVGDIAQIKHSQTDHEQVIYANFAFMVAIHARS 121 
10 W+ L +WQ LIT+TFTGLTLLD++QAT+GDI QI ++ TDHEQVIYANFAFMV +HARS 

Sbjct: 65 WKQLSKEWQDLITKTFTGLTLLDTIQATIGDIKQIDYALTDHEQVIYAMFAFMVGVHARS 124 

Query: 122 YGTIFSTLCTSQQIEEAHEWWDTESLQARSRILIPFYTGDDPLKSKVAAAMMPGFLLYG 181 
YGTIFSTLCTS+QI EAHEWW TESLQ R++ LIP+YTG DPLKSKVAAA+MPGFLLYG 
15 Sbjct: 125 YGTIFSTLCTSEQITEAHEWWKTESLQKRAKALIPYYTGKDPLKSKVAAALMPGFLLYG 184 

Query: 182 GFYLPFYLSARGKLPNTSDIIRLILRDKVIHNYYSGYKYQQKVAKLSVEKQAEMKTFVFD 241 

GFYLPFYLS+R +LPNTSDI IRLILRDKVIHNYYSGYK+Q+KV K+S EKQAEMK FVFD 
Sbjct: 185 GFYLPFYLSSRKQLPNTSDIIRLILRDKVIHNYYSGYKFQRKVEKMSKEKQAEMKRFVFD 244 

20 

Query: 242 LLYQLIDLEKAYLYELYDGFDLAEDAIRFSIYNAGKFLQNLGYDSPFTEEETRISPEVFA 301 

L+Y+LI+LEKAYL ELY+GF + EDAI +FS IYMAGKFLQNLGYDSPFTEEETRI PE+FA 
Sbjct: 245 LMYELIELEKAYLKELYEGFGI VEDAIKFSIYNAGKFLQNLGYDSPFTEEETRIKPEIFA 304 

25 Query: 302 QLSARADENHDFFSGNGSSYIMGITEETLDEDWEF 336 

QLSARADENHDFFSGNGSSY+MGI+EET D+DW+F 
Sbjct: 305 QLSARADENHDFFSGNGSSYVMGISEETEDKDWDF 339 

A related DNA sequence was identified in S.pyogenes <SEQ ID 813> which encodes the amino acid 
30 sequence <SEQ ID 814>. Analysis of this protein sequence reveals the following: 
Possible site: 18 

»> Seems to have no N-terminal signal sequence 

Final Results 

35 bacterial cytoplasm Certainty=0. 3779 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

40 Identities = 292/335 (87%) , Positives = 318/335 (94%) 

Query: 2 QSYYDRSQSPLDYALSEKAFP^SVNWNKI^DKDLEW^VTQNFWLPEKIPVSNDLNS 61 

Q YY+RSQSP++YALSE +RS+NWN LNDDKDLEVWNRVTQNFWLPEK+PVSNDLNS 
Sbjct: 3 QHYYERSQSPIEYALSETQKQLRSINWNYLNDDKDLEVWNRVTQNFWLPEKVPVSNDLNS 62 

45 

Query: 62 WRTLDADWQQLITRTFTGLTLLDSVQATVGDIAQIKHSQTDHEQVIYANFAFMVAIHARS 121 

WR+L DWQQLITRT+TGLTLLD+VQATVGD+AQI+HSQTDHEQVIY NFAFMV IHARS 
Sbjct: 63 WRSLGEDWQQLITRTYTGLTLLDTVQATVGDVAQIQHSQTDHEQVIYTNFAFMVGIHARS 122 

50 Query: 122 YGTIFSTLCTSQQIEEAHEWVVDTESLQARSRILIPFYTGDDPLKSKVAAAMMPGFLLYG 181 

YGTIFSTLC+S+QIEEAHEWW T+SLQ R+R+LIP+YTGDDPLKSKVAAAMMPGFLLYG 
Sbjct: 123 YGTIFSTLCSSEQIEEAHEWWSTQSLQDRARVLIPYYTGDDPLKSKVAAAMMPGFLLYG 182 

Query: 182 GFYLPFYLSARGKLPNTSDIIRLILRDKVIHNYYSGYKYQQKVAKLSVEKQAEMKTFVFD 241 
55 GFYLPFYLSARGK+PNTSDIIRLILRDKVIHNYYSGYKYQQKVA+LS EKQAEMK FVFD 

Sbjct: 183 GFYLPFYLSARGKMPNTSDIIRLILRDKVIHNYYSGYKYQQKVARLSPEKQAEMKRFVFD 242 

Query: 242 LLYQLIDLEKAYLYELYDGFDLAEDAIRFSIYNAGKFLQNLGYDSPFTEEETRISPEVFA 301 
LLY+LIDLEKAYL ELY GFDLAEDAIRFS+YNAGKFLQNLGY+SPFT+EETR+SPEVFA 
60 Sbjct: 243 LLYELIDLEKAYLRELYAGFDLAEDAIRFSLYNAGKFLQNLGYESPFTDEETRVSPEVFA 302 

Query: 302 QLSARADENHDFFSGNGSSYIMGITEETLDEDWEF 336 

QLSARADENHDFFSGNGSSY+MGITEET D+DWEF 
Sbjct: 303 QLSARADENHDFFSGNGSSYVMGITEETTDDDWEF 337 



65 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 257 

A DNA sequence (GBSx0272) was identified in S.agalactiae <SEQ ID 815> which encodes the amino acid 
5 sequence <SEQ ID 816>. This protein is predicted to be rhamnosyltransferase. Analysis of this protein 
sequence reveals the following: 

Possible site: 55 

>>> Seems to have no N-terminal signal sequence 

10 Final Results 

bacterial cytoplasm Certainty=0. 1741 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 A related GBS nucleic acid sequence <SEQ ID 9583> which encodes amino acid sequence <SEQ ID 9584> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 



20 



>GP:BAA32090 GB:AB010970 rhamnosyltransferase [Streptococcus mutans] 
Identities = 104/309 (33%) , Positives = 173/309 (55%) , Gaps = 21/309 (6%) 

Query: 11 QINICLATYNGQKYLRQQLDSIIQQGYTDWICLIRDDGSTDDTVAIIKEYvNRDSRFIFI 70 

++NI ++TYNGQ+++ QQ+ SI +Q + +W LIRDDGS+D T II ++ D+R FI 
Sbjct: 2 KVNILMSTYNGQEFIAQQIQSIQKD/TFENWNLLIRDDGSSDGTPKIIJU3FAKSDARIRFI 61 

25 Query: 71 NSNDDRKLGSHRSFyELVNYKKADFYVFSDQDDvWKENRLERYLEEAEKENQELPLLVYS 130 

N++ G ++FY L+ Y+KAD+Y FSDQDDVW +LE L EK N ++PL+VY+ 

Sbjct: 62 NADKRENFGVIKNFYTLLKYEKADYYFFSDQDDVWLPQKLELTLASVEKENNQIPLMVYT 121 

Query: 131 NWTSVDEKLTVL KEHNPATVIQEQIAFNQINGMVIMMNHELAKLWE- -YRQIG 181 

30 + T VD L VL + H+ T + E++ N + G +M+NH LAK W+ Y + 

Sbjct: 122 DLTVVDRDLQVLHDSMIKTQSHHANTSLLEELTENT^ 181 

Query: 182 AHDSYVGTLAYAVGNVAYI SDSTVLWRRQ VGAES LNNYGRQYG - VATFWQMI 232 

HD Y+ LA ++G + Y+ ++T L+R+ +GA + L N+ R + V +W ++ 

35 Sbjct: 182 MHDWYIALLAASLGKLIYLDETTELYRQHESNVLGARTWSKRLKNWLRPHRLVKKYWWLV 241 

Query: 233 NTSFDRASLIFAQVSDKMSLERKLFFSRFIELKNANLMRRIYLLSKLKLRRKSLKETVAM 292 

+S +AS + + + K ++L + + + RIL+ + T 

Sbjct: 242 TSSQQQASHL LELDLPAANKAI IRAYVTLLDQSFLNRI KWLKQYGFAKNRAFHTFVF 298 



40 



Query: 293 TILLLTGYG 301 

L++T +G 
Sbjct: 299 KTLIITKFG 307 



45 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 258 

A DNA sequence (GBSx0273) was identified in S.agalactiae <SEQ ID 819> which encodes the amino acid 
sequence <SEQ ID 820>. Analysis of this protein sequence reveals the following: 

50 Possible site: 36 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -4.19 Transmembrane 1213 -1229 (1211 -1230) 

Final Results 

55 bacterial membrane Certainty=0 . 2678 (Affirmative) < suco 
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bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9581> which encodes amino acid sequence <SEQ ID 9582> 
5 was also identified. 

There is also homology to SEQ ID 822. 

A related GBS gene <SEQ ID 8525> and protein <SEQ ID 8526> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crenel: 7 
10 SRCFLG: 0 

McG: Length of UR: 3 

Peak Value of UR: 2.28 
Net Charge of CR: 4 
McG: Discrim Score: 1.29 
15 GvH: Signal Score (-7.5): 2.84 

Possible site: 30 
»> Seems to have a cleavable N-term signal seq. 
Amino Acid Composition: calculated from 31 
ALOM program count: 0 value: 1.16 threshold: 0.0 
20 PERIPHERAL Likelihood = 1.16 344 

modified ALOM score: -0.73 

*** Reasoning Step: 3 

25 Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

30 , LPXTG motif: 1197-1201 

SEQ ID 8526 (GBS 147) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 29 (lane 4; MW 132kDa). 

The GBS147-His fusion product was purified (Figure 200, lane 5) and used to immunise mice. The 
35 resulting antiserum was used for FACS (Figure 286), which confirmed that the protein is immunoaccessible 
on GBS bacteria. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 259 

40 A DNA sequence (GBSx0274) was identified in S.agalactiae <SEQ ID 823> which encodes the amino acid 
sequence <SEQ ID 824>. This protein is predicted to be Acetyltransferase (GNAT) family. Analysis of this 
protein sequence reveals the following: 



45 



50 



Possible site: 57 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2781 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAG03505 GB:AE004449 conserved hypothetical protein [Pseudomonas aeruginosa] 



WO 02/34771 



PCT/GB01/04789 



10 



-325- 

Identities = 66/143 (46%) , Positives = 94/143 (65%) , Gaps = 5/143 (3%) 

Query: 2 WNVKTFDNLTTHELFQIYKLRVSVFWEQDCPYQEVDDEDLI - -CLHGMNWVDGQLA&YY 59 

W K +LT EL+ + +LR VFWEQ CPYQEVD DL+ H M W DGQL AY 
Sbjct: 5 WTCKHHADLTLKELYALLQLRTEVFVVEQKCPYQEVDGLDLVGDTHHLMAWRDGQLLAYL 64 

Query: 60 RLIP---EDDKVHLGRVIVNPDFRKKGLGNQLVEYAIKFSEANYPNKPIYAQAQAYLQDF 116 

RL+ + +V +GRV+ + R +GLG+QL+E A++ +E + + P+Y AQA+LQ + 

Sbjct: 65 RLLDPVRHEGQWIGRWSSSAARGQGLGHQLMERALQAAERLWLDTPVYLSAQAHLQAY 124 

Query: 117 YQSFGFQPVSDIYLEDNI PHLDM 139 

Y +GF V+++YLED+IPH+ M 
Sbjct: 125 YGRYGFVAVTEVYLEDDIPHIGM 147 

15 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 260 

A DNA sequence (GBSx0275) was identified in S.agalactiae <SEQ ID 825> which encodes the amino acid 
20 sequence <SEQ ID 826>. Analysis of this protein sequence reveals the following: 

Possible site: 37 

>>> Seems to have no N-terminal signal sequence 

Final Results 

25 bacterial cytoplasm Certainty=0. 2010 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

30 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 261 

A DNA sequence (GBSx0276) was identified in S.agalactiae <SEQ ID 827> which encodes the amino acid 
35 sequence <SEQ ID 828>. Analysis of this protein sequence reveals the following: 

Possible site: 39 

>>> Seems to have no N-terminal signal sequence 

Final Results 

40 bacterial cytoplasm Certainty=0. 2935 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database: 

45 >GP:CAB12631 GB:Z99108 similar to RNA methyltransf erase [Bacillus subtilis] 

Identities = 217/448 (48%) , Positives = 298/448 (66%) , Gaps = 4/448 (0%) 

Query: 7 QRIPLKIKRMGINGEGIGFYKKTLIFVPGALKGEEVFCQISSVRRNFAEAKLLKINKKSK 66 
Q PL IKR+GINGEG+G++KK ++FVPGAL GEEV Q + V+ F+E ++ KI K S+ 
50 Sbjct: 16 QTFPLTIKRLGINGEGVGYFKKKWFVPGALPGEEVVVQATKVQPKFSEGRIKKIRKASE 75 



Query: 67 NRVEPPCSIYKECGGCQIMHLQYDKQLEFKTDVIRQALMKFKPEGYENYEIRKTIGMSEP 126 
+RV PPC +Y++CGGCQ+ HL Y +QL K D++ Q+L + EN EI++TIGM P 
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Sbjct: 76 HRVAPPCPVYEQCGGCQLQHLAYSQQIjREKRDIVIQSLERHTKFKVENMEIKETIGMDNP 135 

3HYRAKLQFQV-RSFGGITVKAGLYAQGTHRLIDIKDCLVQDSLTQEMINRVAELLGKYKL 185 
+YR K QFQ+ RS G++ AGLY +H ++ IKDC+VQ T + V +L + + 



+YNERK G VRT++ R +GEVQ++ +T+K +++V + + PE+K++ N 



+N +KTS I+G+ T+ + G+ I E + D F LS RAF+QLNP+QT LY E KA + 



+ ++DAYCGVGTIG+ A K VRGMD+I E+I DAK+NA G N Y G AE 



+P+W EGFR + +IVDPPRTG D L+TI K+ P++ VYVSCN STIA+DL TL+K 





Sbjct: 


76 




Query: 


127 


5 


Sb j ct : 


136 




Query: 


186 


10 


Sb j ct : 


196 




Query: 


243 




Sb j ct : 


256 


15 


Query: 


303 




Sb j ct : 


316 


20 


Query: 


363 




Sb j ct : 


376 




Query: 


423 


25 


Sb j ct : 


436 



Y V YIQ VDMFP TA EAV +L K 
DYRVDYIQPVDMFPQTAHVEAVARLVTiK 463 

A related DNA sequence was identified in S.pyogenes <SEQ ID 829> which encodes the amino acid 
sequence <SEQ ID 830>. Analysis of this protein sequence reveals the following: 

Possible site: 56 
30 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 2980 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

35 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 327/450 (72%) , Positives = 397/450 (87%) 



M V +KQ+IPLKIKRMG1NGEGIGFY+KTL+FVPGALKGE++FCQI++V+RNFAEAKLL 



45 +NK SKNRV+P CS+Y+ CGGCQIMHL Y KQL+FK DVIRQAL KFKP GYE +EIR T 



Query: 


1 


Sb j ct : 


1 


Query: 


61 


Sbj ct : 


61 


Query: 


121 


Sbj ct : 


121 


Query: 


181 


Sbj ct : 


181 


Query: 


241 


Sbjct: 


24-1 


Query: 


301 


Sbj ct : 


301 



+GM +P+HYRAKLQFQ+RSFGG VKAGL++QG+HRL+ I +CLVQD LTQ++IN++ +L+ 



KYKLPIYNERKIAG+RT+M+R+AQAS +VQ+I ++SK + + + EL + FP++KTVA 



+N N SK+S+IYG TE++WGQE+I+EEVLDYGF+LSPRAFYQLNP+QT++LY E VKAL 



DV D +IDAYCGVG+IG AFAGKVKSVRGMDIIPEAI+DA++NA MGF N +YEAGK 



65 



Query: 361 AEDIIPRWYSEGFRANALIVDPPRTGLDDKLIiOTILKMPPEKMvYVSCINTSTLARDLvTL 420 
AEDII +WY +G+RA+A+IVDPPRTGLDDKLL TIL P++MVYVSCNTSTLARDLV L 
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Sbjct: 361 AEDIISKWYKQGYRADAVIVDPPRTGLDDKliLKTIIjHyQPKQMVWSCOTSTLARDLVQL 420 

Query: 421 TKVYHVHYIQSVDMFPHTARTEAWKLQRK 450 
TKVY VHYIQSVDMFPHTARTEAWKLQ++ 
5 Sbjct: 421 TKVYDVHYIQSVDMFPHTARTEAWKLQKR 450 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 262 

10 A DNA sequence (GBSx0277) was identified in S.agalactiae <SEQ ID 831> which encodes the amino acid 
sequence <SEQ ID 832>. Analysis of this protein sequence reveals the following: 

Possible site: 45 

>>> Seems to have no N-terminal signal sequence 

15 Final Results 

bacterial cytoplasm Certainty=0 .3505 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

20 The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB04643 GB:AP001510 unknown conserved protein in B. subtilis 
[Bacillus halodurans] 
Identities = 74/263 (28%) , Positives = 141/263 (53%) , Gaps = 9/263 (3%) 

25 Query: 3 ITKIEKKKR LYTLEL-DNTENLY ITEDTIVHFMLSKGMIINAEKLENIKKFAQL 55 

IT+IE +KR Y + + N +++Y + E ++ L KG+ I+AE+++ I ++ 
Sbjct: 4 ITRIEVQKRNNERYNIFIHQNGQDWAFSVDEQVL1KQGLRKGLDIDAEQMKQILYEDEV 63 

Query: 56 SYGKNLGLYYISFKQRTEKEVIKYLQQHDIDSKIIPQ1IDNLKSENWINDKNYVQSFIQQ 115 
30 NL L+Y+S++ R+ EV YL++ D + II ++ L + ++D + ++FIQ 

• Sbjct: 64 QKTFNLALHTLSYRMRSVHEVRTYLKKKDREEPIIEHVLHRLTEQRLLDDHAFAEAFIQT 123 

Query: 116 NLNTGDKGPYVIKQKLLQKGIKSKIIESELQAINFQDLASKISQKLYKKYQNKLPLKAL- 174 
T KGP +KQ+L +KG+ K IE L ++++ ++ L K+ +h 
35 Sbjct: 124 KRATTSKGPLKLKQELAEKGVSEKTIEGALTTFSYEEQVEQVKAWLEKQKGRTFKGSSLA 183 

Query: 175 -KDKLMQSLTTKGFDYQIVHTVIQNLEIEKDQELEEDLIYKELDKQYQKLSKKHDQYELK 233 

K KL + L KG+ ++ ++ I++++E E + + +K +K + K +EL+ 

Sbjct: 184 WKQKLSRQLLAKGYTSPVIEEAFADVPIKQEEEEEWEALKAFGEKAMRKYAGKKTGWELQ 243 



40 



Query: 234 QRIINALMRKGYQYEDIKSALRE 256 

Q++ AL RKG+ E 1+ L + 
Sbjct: 244 QKVKQALYRKGFSLEMIERYLND 266 

45 A related DNA sequence was identified in S.pyogenes <SEQ ID 83 3> which encodes the amino acid 
sequence <SEQ ID 834>. Analysis of this protein sequence reveals the following: 

Possible site: 58 

»> Seems to have no N-terminal signal sequence 

50 Final Results 

bacterial cytoplasm Certainty=0 .2388 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

55 An alignment of the GAS and GBS proteins is shown below: 

Identities = 146/258 (56%) , Positives = 190/258 (73%) 



Query: 1 MKITKIEKKKRLYTLELDNTENLYITEDTIVHFMLSKGMIINAEKLENIKKFAQLSYGKN 60 
MKITKIEKKKRLY +ELDN E+LY+TEDTIV FMLSK +++ ++LE++K FAQLSYGKN 
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Sbjct: 


1 


MKITKIEKKKRLYLIELDNDESLYTOEDTIWFMLSKDKVLDNDQLEDMKHFAQLSYGKN 


60 


Query: 


61 


LGLYYI S FKQRTEKEVI KYLQQHDIDSKI I PQI IDNLKSENWINDKNYVQS F IQQNLNTG 


120 






L LY++SF+QR+ K+V YL++H+I+ II II L+ E WI+D ++I+QN G 




Sb j ct : 


61 


IALYFLSFQQRSNKQVADYLRKHEIEEHIIADIITQLQEEQWIDDTKLADTYIRQNQLNG 


120 


Query: 


121 


DKGPYVIKQKLLQKGIKSKIIESELQAINFQDLASKISQKLYKKYQNKLPLKALKDKLMQ 


180 






DKGP V+KQKLLQKGI S 1+ L +F LA K+SQKL+ KYQ KLP KALKDK+ Q 




Sbjct: 


121 


DKGPQVLKQKLLQKGIASHDIDPILSQTDFSQLAQKVSQKLFDKYQEKLPPKALKDKITQ 


180 


Query: 


181 


SLTTKGFDYQIVHTVIQNLEIEKDQELEEDLIYKELDKQYQKLSKKHDQYELKQRI INAL 


240 






+L TKGF Y + + +L ++D + EDL+ KELDKQY+KLS+K+D Y LKQ++ AL 




Sb j ct : 


181 


ALLTKGFSYDLAKHSLNHLNFDQDNQEIEDLLDKELDKQYRKLSRKYDGYTLKQKLYQAL 


240 


Query: 


241 


MRKGYQYED I KSALREYL 258 








RKGY +DI LR YL 




Sb j ct : 


241 


YRKGYNSDD INCKLRNYL 258 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 263 

A DNA sequence (GBSx0278) was identified in S.agalactiae <SEQ ID 835> which encodes the amino acid 
sequence <SEQ ID 836>. Analysis of this protein sequence reveals the following: 

Possible site: 33 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3912 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside — Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB04659 GB:AP001510 unknown conserved protein in B. subtilis 
[Bacillus halodurans] 
Identities = 96/175 (54%) , Positives = 122/175 (68%) 

Query: 1 MRLPKEGDFITIQSYKHDGSLHRTWRDTMvLKTTENALIGVNDHTLVTENDGRRVfVTREP 60 

M PK G I IQSYKH+GS+HR W +T+VLK T +IG ND LV E+DGR W TREP 
Sbjct: 1 MNFPKVGSKIQIQSYKHNGSIHRIWEETIVLKGTSKVVIGGNDRILVKESDGRHWRTREP 60 

Query: 61 AIWFHKKYWFNIIAMIRETGVSYYCMASPYILDPEALKYIDYDLDVKVFADGEKRLLD 120 

AI YF + WEN I MIR G+ +YCNL +P+ D EALKYIDYDLD+KVF D +LLD 
Sbjct: 61 AICYFDSEQWFNTIGMIRADGIYFYCNLGTPFTWDEEALKYIDYDLDIKVFPDMTFKLLD 120 

Query: 121 VDEYEQHKAQMNYPTDIDYILKENVKILVEWINENKGPFSSSYINIWYKRYLELK 175 

DEY H+ M YP +ID IL+ +V LV WI++ KGPF+ ++ WY+R+L+ + 
Sbjct: 121 EDEYAMHRKMMKYPPEIDRILQRSVDELVSWIHQRKGPFAPQFVESWYERFLQYR 175 

A related DNA sequence was identified in S.pyogenes <SEQ ID 837> which encodes the amino acid 
sequence <SEQ ID 838>. Analysis of this protein sequence reveals the following: 

Possible site: 33 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3912 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below: 
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Identities = 155/177 (87%) , Positives = 165/177 (92%) 

Query: 1 MRLPKEGDFITIQSYKHDGSLHRTWRDTMVLKTTENALIGVM)HTLvTENDGRRWvTREP 60 

M+LPKEGDFITIQSYKHDGSLHRTWRDTMVLKTiraftLIGVNDHTLVTE+DGRRWVTREP 
Sbjct: 1 MKLPKEGDFlTIQSYKHDGSLHRTV^T^WLKTTE]m J IGvlroHTLVTESDGRJlVm'REP 60 

Query: 61 AIWFHKKYWFNIIAMIRETGVSYYCNLASPYILDPEALKYIDYDLDVKVFADGEKRLLD 120 

AI VYFHKKYWFNI IAMIR+ GVSYYCNLASPY++D EALKYIDYDLDVKVFADGEKRLLD 
Sbjct: 61 AIWFHKKYWFNIIAMIRDNGVSYYCNIASPYMMDTEALKYIDYDLDVKVFADGEKRLLD 120 

Query: 121 VDEYEQHKAQMNYPTDIDYILKEWKILvEWINENKGPFSSSYINIWYKRYLEJjKKR 177 

VDEYE HK +M Y D+D+ILKENVKILV+WIN KGPFS +YI IWYKRYLELK R 
Sbjct: 121 VDEYEIHK^MQYSADMDFILKENVKILVDWINHEKGPFSKAYITIWYKRYLELKNR 177 



15 Based on this analysis, it was predicted that these proteins and their epitopes could he useful antigens for 
vaccines or diagnostics. 

Example 264 

A DNA sequence (GBSx0288) was identified in S.agalactiae <SEQ ID 839> which encodes the amino acid 
sequence <SEQ ID 840>. This protein is predicted to be jag protein. Analysis of this protein sequence 
20 reveals the following: 

Possible site: 49 

>>> Seems to have no N-terminal signal sequence 

Final Results 

25 bacterial cytoplasm Certainty=0 . 1666 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 
bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

30 >GP:BAB07782 GB:AP001520 spoIIIJ-associated protein [Bacillus halodurans] 

Identities = 54/198 (27%) , Positives = 98/198 (49%) , Gaps = 6/198 (3%) 

Query: 100 DVvEEYIEEVDETLEKEDVSQPELPKIDDKNVVTTSEAIEKIDbLPNIEVAAAQVTKYVE 159 
+ VE+ I E+ T E+ + E PK ++ + A+ ++ + P+ + ++E 

35 Sbjct: 13 EAVEQAI IELGTTRERITYTWEEPKSGLFGILGSKPAVIEVWKPD PVDRAKAFLE 69 

Query: 160 NIIYEMDLDA- -TIETTTSKRQINLQIETPEAGRIIGYHGKVLKSLQLLAQNYLHDRFSK 217 

++ EMD++ TIE + N+ E + G +IG G+ L SLQ L + + 

Sbjct- 70 ELLQEMDMEvEVTIEKDPATVLFNISGEQ-DLGTLIGKRGQTLDSLQYLVNLVANKEEGE 128 

40 

Query: 218 SFSVSINVHDYVEHRTETLIDFSKKIARRVLETNEPYHMDPMSNSERKTVHKTIATIEGV 277 

+ ++ +Y R E L+ ++++A + L T P ++PMS ERK +H + + V 
Sbjct: 129 FIRIKLDAENYRARRKEALVQLAERLASKALRTKRPVSLEPMSAHERKIIHTALQELGDV 188 

45 Query: 278 ESYSEGNDPNRFVWTKK 295 

E+YSEG R W+ K 
Sbjct: 189 ETYSEGQGIGRHWIAPK 206 

A related DNA sequence was identified in S.pyogenes <SEQ ID 84 1> which encodes the amino acid 
50 sequence <SEQ ID 842>. Analysis of this protein sequence reveals the following: 

Possible site: 27 

>>> Seems to have no N-terminal signal sequence 

Final Results 

55 bacterial cytoplasm Certainty=0. 3721 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty^O . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below: 
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Identities = 176/302 (58%) , Positives = 223/302 (73%) , Gaps = 32/302 (10%) 

Query: 23 MVLFTGATVEEAIEKGLQELNISRLRMIK^ 82 

MVLFTG TVEEAIE GLQEL +SRL+AH1KV+S+EKKGFLGFGKKPA+V+1EGI+D+ 
Sbjct: 1 MVLFTGKTVEEAIETGLQELGLSRLKAHIKyiSKEKKGFLGFGKKPAQVDIEGISDKTVY 60 

Query: 83 INESVALKNI KNVPS- -SVDWEEYIEEVDETLEKEDVSQPELPRIDDK 129 

+ A + + +N P+ S DV E 1+ + LE ED L D 

Sbjct: 61 KADKKATRGVPEDimQHTPAVNSADVEPEEIKAT-QRLEAEDTKVVPLMSEDSPAQTPS 119 

Query: 130 NWTTSEA IEKIDL LPNIEVAAAQVTKYVENIIYEMDLDATI 171 

VT ++A +E+ ++ +IE AA +V+ YV IIYEMD++AT+ 

Sbjct: 120 KI^TVTSTKAQQPSIPVEESEVPQDAGNDGFSKDIEKAAQEVSDYVTKIIYEMDIEATV 179 

Query: 172 ETTTSKRQINLQIETPEAGRIIGYHGKVLKSLQLIAQNYLHDRFSKSFSVSINVHDYVEH 231 

ET+ ++RQINLQIETPEAGR+IGYHGKVLKSLQLLAQN+LHDR+SK+FSVS+NVHDYVEH 
Sbjct: 180 ETSNNFJIQINLQIETPEAGRVIGYHGKVLKSLQLIAQNFLHDRYSKNFSVSIiNVHDYVEH 239 

Query: 232 RTETLIDFSKKIARRVLETNEPYHMDPMSNSERKTVHKTIATIEGVESYSEGiNDPNRFW 291 

RTETLIDF++K+A+RVLE+ + Y MDPMSNSERK VHKT+++IEGV+SYSEGNDPNR+W 
Sbjct: 240 RTETLIDFTQKVAKRVLESGQDYTMDPMSNSERKIVHKTVSS1EGVDSYSEGNDPNRYW 299 

Query: 292 VT 293 
V+ 

Sbjct: 300 VS 301 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 265 

A DNA sequence (GBSx0290) was identified in S.agalactiae <SEQ ID 843> which encodes the amino acid 
sequence <SEQ ID 844>. This protein is predicted to be 60 kd inner-membrane protein (yidC). Analysis of 
this protein sequence reveals the following: 

Possible site: 42 

>>> May be a lipoprotein 

INTEGRAL Likelihood = -7.38 Transmembrane 54 - 70 ( 52 - 75) 

INTEGRAL Likelihood = -5.20 Transmembrane 193 - 209 ( 192 - 211) 

INTEGRAL Likelihood = -3.61 Transmembrane 125 - 141 ( 124 - 144) 

INTEGRAL Likelihood = -2.44 Transmembrane 158 - 184 ( 167 - 184) 



Final Results 

bacterial membrane Certainty=0. 3951 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CA2V78595 GB:Z14225 SpoIIIJ [Bacillus subtilis] 
Identities = 79/243 (32%) , Positives = 142/243 (57%) , Gaps = 5/243 (2%) 

Query: 1 MKKKLKTFSLILLTGSLLVAOG- - RGEVSSHSATLWEQ- IVYAFAKS IQWLS - - FNHS I G 55 

MK+++ ++ LL C + +++ S W++ +VY ++ I +++ +■ G 

Sbjct: 1 MKRRIGLLLSNWGVFMLLAGCSSVKEPITADSPHFWDKYVVYPLSELITYVAKLTGDNYG 60 

Query: 56 LGIILFTLIIRAIMMPLYNMQMKSSQKMQEIQPRLKELQKKY^ 115 

L IIL T++IR +++PL Q++SS+ MQ +QP +++L++KY KD + KL E ++ 
Sbjct: 61 LSIILOTILIRLLILPLMIKQLRSSKAMQAI^PE^KLKEKYSSKDQKTQQKLQQETMAL 120 

Query: 116 YKAEGVNPYASVLPLLIQLPVLWALFQALTRVSFLKVGTFLSLELSQPDPYYILPvIASi 175 

++ GVNP A P+LIQ+P+L + A+ R + +FL +L + DPYYILP++A + 
Sbjct: 121 FQKHGVNPLAGCFPILIQMPILIGFYHAIMRTQAISEHSFLWFDLGEKDPYYILPIVAGV 180 



Query. 176 FTFLSTWLTNKAAVEKNIALTLMTYvMPFIlLvTSFNEASGWLYWTVSNAFQVFQILLL 235 
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TF+ L ++N + +M ++MP +I+V + NF + + LYW V N F + Q L+ 

Sbjct: 181 ATFVQQKLMMAGNAQQNPQMAMMLWIMPIMIIVFAINFPAALSLYWWGNLFMIAQTFLI 240 

Query: 236 NNP 238 
5 P 

Sbjct: 241 KGP 243 

A related GBS sequence was identified <SEQ ID 10783> which encodes amino acid sequence <SEQ ID 
10784>. 

10 A related DNA sequence was identified in S.pyogenes <SEQ ID 845> which encodes the amino acid 
sequence <SEQ ID 846>. Analysis of this protein sequence reveals the following: 

Possible site: 49 

>>> May be a lipoprotein 

15 INTEGRAL Likelihood = -6.32 Transmembrane 198 - 214 ( 197 - 220) 

INTEGRAL Likelihood = -5.52 Transmembrane 59 - 75 ( 57 - 80) 

INTEGRAL Likelihood = -4.25 Transmembrane 130 - 146 ( 129 - 150) 

INTEGRAL Likelihood = -2.28 Transmembrane 173 - 189 ( 170 - 189) 

20 Final Results 

bacterial membrane Certainty=0 . 3527 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

25 The protein has homology with the following sequences in the databases: 

>GP:BAA05234 GB:D26185 stage III sporulation [Bacillus subtilis] 
Identities = 90/249 (36%) , Positives = 150/249 (60%) , Gaps = 6/249 (2%) 

Juery: 16 IVPLVLLLVACG--RGEVTAQSSSGWDQ-LVYLFARAIQWLS--FDGSIGVGIILFTLTI 70 
30 +V + +LL C + +TA S WD+ +VY + I +++ + G+ IIL T+ I 



35 



40 



45 



55 



Query: 


16 


Sb j ct : 


13 


Query: 


71 


Sb j ct : 


73 


Query: 


131 


Sb j ct : 


133 


Query: 


191 


Sbjct: 


193 


Query: 


250 


Sb j ct : 


253 



RL+++PL Q++SS+ MQ +QPE+++L+ KY+ KD +T+ KL +E+ AL++K+GVNP A 



P+LIQMP++I + A+ R + +FLW +L + D Y+LP++A V TF+ 



++N M +M+++MP+MI N + + LYW VNF + Q L+ P K 



Q+ ++K 



An alignment of the GAS and GBS proteins is shown below: 

50 Identities = 172/270 (63%) , Positives = 217/270 (79%) , Gaps = 1/270 (0%) 



Query: 1 MKKKLKTFSLILLTGSLLVACGRGEVSSHSATLWEQIVYAFAKSIQWLSFNHSIGLGIIL 60 

+KK +K ++ L LLVACGRGEV++ S++ W+Q+VY FA++IQWLSF+ SIG+GIIL 
Sbjct: 7 VKKNIKIARIVPLV-LLLVACGRGEVTAQSSSGWDQLVYLFARAIQWLSFDGSIGVGIIL 65 

Query: 61 FTLIIRAI^PLYNMQMKSSQ^QEIQPRLKEI^KKYPGKDPDNRLKLNDEMQSMYKAEG 120 

FTL IR ++MPL+NMQ+KSSQKMQ+IQP L+ELQ+KY GKD R+KL +E Q++YK G 
Sbjct: 66 FTLTIRLMLMPLFNMQIKSSQKMQDIQPELRELQRKYAGKBTQTRMKLAEESQALYKKYG 125 

60 Query: 121 VNPYASVLPLLIQLPVLWALFQALTRVSFLKVGTFLSLELSQPDPYYILPVIAALFTFLS 180 

VNPYAS+LPLLIQ+PV+ ALFQALTRVSFLK GTFL +EL+Q D Y+LPVLAA+FTFLS 
Sbjct: 126 WPYASLLPLLIQMPVMIALFQALTRVSFLKTGTFLWVELAQHDHLYLLPVLAAVFTFLS 185 
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Query: 181 TWLTNKAAVEKNIALTLMTYVMPFIILVTSENFASGVVLYWTVSNAFQVFQILLLKNPYK 240 

TWLTN AA EKN+ +T+M YVMP +1 FN ASGvVLYWTVSNAFQV Q+LLLNNP+K 
Sbjct: 186 TWLTNLAAKEKNVMMTVMIYVMPLMIFFMGFNLASGVVL^ 245 

Query: 241 1 1 KVREEATO vAHEKEQRVKRAKRKASKKR 270 

II R+ E+ R +RA++KA K++ 

Sbjct: 246 I IAERQRLANEEKERRLRERRARKKAMKRK 275 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

A related GBS gene <SEQ ID 8527> and protein <SEQ ID 8528> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: 20 Crend: 5 
McG: Discrim Score: 4.90 
GvH: Signal Score (-7.5): -0.39 

Possible site: 42 
>>> May be a lipoprotein 
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modified ALOM score: 1.98 



*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0. 3 951 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

32.8/62.3% over 242aa 

Bacillus subtilis 

EGAD|l7722| stage III sporulation protein j precursor Insert characterized 
OMNI |NT01BS4782 -identity Insert characterized 

SP|Q01625|SP3J_BACSU STAGE III SPORULATION PROTEIN J PRECURSOR. Edit characterized 
GP]40023|emb|CAA44401.l| |X62539 unnamed protein product Insert characterized 
GP j 467388 |dbj |BAA05234.l| |D26185 stage III sporulation Insert characterized 
GP|263665l|emb|CAB16141.l| |Z99124 alternate gene name: spo0J87 Insert characterized 
PIR] 140437 | 140437 stage III sporulation protein spoIIIJ - Insert characterized 

ORF02221(301 - 1014 of 1413) 

EGAD | 17722 | S4098 (3 - 245 of 261) stage III sporulation protein j precursor { acillus 
Subtilis}OMNl|NT01 S4782 -identitySP | Q01625 | SP3 J_ ACSU STAGE III SPORULATION PROTEIN J 
PRECURSOR. GP | 40023 | emb | CAA44401 . 1 | |X62539 unnamed protein product { acillus 
subtilis}GP|467388|dbj j AA05234 . 1 | |D26185 stage III sporulation { acillus 
subtilis}GP|263665l|emb|CA 16141 . 1 | | Z99124 alternate gene name: spoOJ87 { acillus 
subtilis} PIR | 140437 | 140437 stage III sporulation protein spoIIIJ - acillus subtilis 
%Match =17.0 

%Identity =32.8 %Similarity =62.2 

Matches = 79 Mismatches = 88 Conservative Sub.s = 71 

219 249 279 309 339 393 420 

DFWIARKGVEELDYQALEKNLIHVLKIAGLI *KGIKLKKKLKTFSLILLTGSLLVACG- -RGE VSSHSATLWEQ- IVYA 

= lb- = - III : =:: I :|:: =11 
MLLKRRIGLLLSMVGVFMLIAGCSSVKEPITADSPHFWDKYVVYP 
10 20 30 40 



474 504 534 564 594 624 654 

FAKSIQWLS--FNHSIGLGIILFTLIIRAIMMPLYNMQMKSSQKMQEIQPRLKELQK2CYPGKDPDNRLKIjNDEMQSMYKA 



WO 02/34771 



-333- 



PCT/GB01/04789 




Query: 81 RKIRHVLLSQKTALQDYDFWIARKGVEELDYQALEKNLIHVLKIAGLI 129 
RKIRHV+++ L+ DFWIARKGV L+YQ L++NL HVLK+A L+ 
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Sbjct: 61 RKIRHVIMALGHQLKSEDFWIARKGVHSLEYQELQQNLHHVLKLAQLL 109 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

5 Example 267 

A DNA sequence (GB8x0292) was identified in S.agalactiae <SEQ ID 851> which encodes the amino acid 
sequence <SEQ ID 852>. This protein is predicted to be glycerol-3-phosphate dehydrogenase, NAD- 
dependent (gpsA). Analysis of this protein sequence reveals the following: 

Possible site: 33 
10 >» Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0 . 1429 (Affirmative) < suco 
bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 
15 bacterial outside --- Certainty=0 .0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 8529> which encodes amino acid sequence <SEQ ID 8530> 
was also identified. There is a signal peptide at residues 1-19. The protein has homology with the following 
sequences in the GENPEPT database: 

20 >GP:AAA86746 GB:U32164 NAD (P) H- dependent dihydroxyacetone -phosphate 

reductase [Bacillus subtilis] 
Identities = 177/333 (53%) , Positives = 241/333 (72%) 

Query: 18 QKIAVLGPGSWGTAIAQVIOTJNGHEvIttWGOT 77 
25 +K+ +LG GSWGTALA VL DNG+EV +W + +1 +IN H N+ Y ++ L + IK 

Sbjct: 2 KKVTMLGAGSWGTAIALVLTOTGNEVC^^ 61 

Query: 78 YTNLEEAINNVDSILFWPTKVTRLVAKQVANLLKHKVVLMHASKGLEPGTHERLSTILE 137 
T+++EA+++ D 1+ VPTK R V +Q + K V +H SKG+EP + R+S I+E 
30 Sbjct: 62 TTDMKEAVSDADVI IVAVPTKAIREVLRQAVPFITKKAVFVHVSKGIEPDSLLRISEIME 121 

Query: 138 EEISEQYRSDI VVVSGPSHAEEAIWDITLITAASKDIEAAKYVQKLFSNHYFRLYTNTD 197 

E+ R DIW+SGPSHAEE +R T +TA+SK + AA+ VQ LF NH FR+YTN D 
Sbjct: 122 IELPSDTORDIVVLSGPSHAEEVGLRHATTVTASSKSMRAAEEVQDLFINHNFRVYTNPD 181 

35 

Query- 198 WGVETAGALKNIIAVGAGALHGLGYGDNAKAAIITRGLAEITRLGVQLGADPLTFSGLS 257 

++GVE GALKNIIA+ AG GLGYGDNAKAA+ITRGLAEI RLG ++G +PLTFSGL+ 
Sbjct: 182 IIGVEIGGALKNIIALAAGITDGLGYGDNAKAaLITRGLAEIARLGTKMGGNPLTFSGLT 241 

40 Query: 258 GVGDLIWGTSTOSRNV^GDALGRGEKLEDIEKNMGWIEGISTTKVAYEIAQNLNVYM 317 

GVGDLIVT TSVHSRNWRAG+ LG+G KLED+ + MGMV+EG+ TTK AY++++ +V M 
Sbjct: 242 GVGDLIWCTSTOSRNWRAGNLLGKGYKLEDVLEEMGMVvEGTOTTKAAYQLSKKYDVKM 301 

Query: 318 PITEAIYKSIYEGANIKDSILDMMSNEFRSENE 350 
45 PITEA+++ ++ G ++ ++ +M+ E E 

Sbjct: 302 PITEAIiHQvXiFNGQKVETAVESLMARGKTHEME 334 

A related DNA sequence was identified in S.pyogenes <SEQ ID 853> which encodes the amino acid 
sequence <SEQ ID 854>. Analysis of this protein sequence reveals the following: 

50 Possible site: 19 

t>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0 . 0882 (Affirmative) < suco 
55 bacterial membrane — Certainty=0 .0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below: 
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Identities = 287/338 (84%) , Positives = 316/338 (92%) 

Query: 15 MTKQKIAVLGPGSWGTALAQvlNDNGHEVRLWGNVVEQIEEimTSIHTNQRYFKDITLDSK 74 

MTKQK+A+LGPGSWGTAL+QVXJSIDNGH+VRLWGN+ +QIEEINT HTN+ YFKDI LD 
Sbjct: 1 MTKQCTAILGPGSWGTALSQVLiroNGHDvI^WGNIPDQIEEIOTKHTNRHYFKIIIVLD^ 60 

Query: 75 IKAYTNLEEAINNvDSILFWPTKVTRLVAKQVAJ^IiKHKVVLMHASKGLEPGTHERLST 134 

I A +L +A+++VD++LFWPTKVTRLVA+QVA +L HKW+MHASKGLEP THERLST 
Sbjct: 61 ITATLDLGQALSDVDAVIjFWPTKVTRLVARQVAAILDHKVVVMHASKGLEPETHERLST 120 

Query: 135 ILEEEISEQYRSDIvWSGPSHAEEAIVRDITLITAASKDIEAAKYVQKLFSNHYFRLYT 194 

ILEEEI +RS++WVSGPSHAEE IVRDITLITAASKDIEAAKYVQ LFSNHYFRLYT 
Sbjct: 121 ILEEEIPAHFRSEVVWSGPSHAEETIVRDITLITAASKDIFAAKYVQSLFSNHYFRLYT 180 

15 Query: 195 NTDWGVETAGALKNIIAVGAGALHGLGYGDNAKAAIITRGLAEITRLGVQLGADPLTFS 254 

NTDV+GVETAGALKNI IAVGAGALHGLGYGDNAKAA+ITRGIiAEITRLGV+LGADPLT+S 
Sbjct: 181 NTDVIGVETAGALKNI IAVGAGALHGLGYGDNAKAAVITRGLAEITRLGVKLGADPLTYS 240 

Query: 255 GLSGVGDLIVTGTSVHSRNWRAGDALGRGEKLEDIEKKMGMVIEGISTTKVAYEIAQNLN 314 
20 GLSGVGDLIVTGTSVHSRNWRAG ALGRGEKLEDIE+NMGMVIEGI+TTKVAYEIAQ+L 

Sbjct: 241 GLSGVGDLIVTGTSVHSRNWRAGAALGRGEKLEDIERNMGMVIEGIATTKVAYEIAQDLG 300 

Query: 315 VYMPITEAIYKSIYEGANIKDSILDMMSNEFRSENEWH 352 
VYMPIT AIYKSIYEGA+IK+SIL MMSNEFRSENEWH 
25 Sbjct: 301 VYMPITTAIYKSIYEGADIKESILGMMSNEFRSENEWH 338 

SEQ ID 8530 (GBS291) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 59 (lane 5; MW 38.9kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 77 (lane 2; MW 64kDa). 

30 GBS291-GST was purified as shown in Figure 226, lane 10-11. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 268 

A DNA sequence (GBSx0293) was identified in S.agalactiae <SEQ ID 855> which encodes the amino acid 
35 sequence <SEQ ID 856>. This protein is predicted to be glucose- 1 -phosphate uridylyltransferase (gtaB). 
Analysis of this protein sequence reveals the following: 

Possible site: 25 

>» Seems to have a cleavable N-term signal seq. 

40 Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

45 The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAA28714 GB:AB001562 glucose-l-phosphate uridylyltransferase 
[Streptococcus mutans] 
Identities = 263/296 (88%) , Positives = 285/296 (95%) 

50 Query: 2 KVRKAVIPAAGLGTRFLPATKALAKEMLPIVDKPTIQFIvEFALKSGIEDILVVTGKSKR 61 

KTOKAVIPAAGLGTRFLPATKALAKEMIiPIVDKCTIQFIVEFjy^KSGIEDILVVTGKSKR 
Sbjct: 5 KVRKAVIPAAGLGTRFLPATKALAKEMDPIVDKPTIQFIVEEALKSGIEDILVVTGKSKR 64 

Query: 62 SIEDHFDSNFELEYNLKEKGKNELLKLVDETTGIRLHFIRQSHPRGLGDAVLQAKAFVGN 121 
55 SIEDHFDSNFELEYNL++KGK +LLKLV++TT I LHFIRQSHPRGLGDAVLQAKAFVGN 

Sbjct: 65 SIEDHFDSNFELEYNLEQKGKTDLLKLVNDTTAINLHFIRQSHPRGLGDAVLQAKAFVGN 124 

Query: 122 EPFWMLGDDLMDITNNKVIPLTKQLINDFFATHASTIAVMEVPHEDVSAYGVIAPQGEG 181 



WO 02/34771 



PCT/GB01/04789 



-336- 

EPFWMLGDDLMDIT++K IPLT+QL+ND+E THASTIAVMEVPHEDVSAYGVIAPQGEG 
Sbjct: 125 EPFVVMLGDDLMDITDDKA.IPLTRQLMNDYEETHASTIAVMEVPHEDVSAYGVIAPQGEG 184 

Query: 182 VNGLYSVNTFVEKPSPEEAPSNLAIIGRYLLTPEIFNILETQKPGAGNEIQLTDAIDTLN 241 
5 V+GLYSV+TFVEKP+P+EAPSNIAI IGRYLLTPEIF ILETQ+PGAGNE+QLTDAIDTLN 

Sbjct: 185 VSGLYSVDTFVEKPAPKEAPSNLAIIGRYLLTPEIFTILETQEPGAGNEVQLTDAIDTLN 244 

Query: 242 KTQRVFARKFTGDRYDVGDKFGFMKTS IDYALQHPQVKDDLKKYI IDLGKSLEKTS 297 
KTQRVFAR+F G RYDVGDKFGFMKTS IDYAL+HPQVK+DLK YII+LGK L++ S 
10 Sbjct: 245 KTQRVFAREFKGKRYDVGDKFGFMKTSIDYALKHPQVKEDLKAYIIELGKKLDQKS 300 

A related DNA sequence was identified in S.pyogenes <SEQ ID 857> which encodes the amino acid 
sequence <SEQ ID 858>. Analysis of this protein sequence reveals the following: 

Possible site: 26 
15 >» Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Mot Clear) < suco 

20 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 257/295 (87%) , Positives = 277/295 (93%) 

25 Query: 2 KVRKAVIPAAGLGTRFLPATKAbAKEMLPIVDKPTIQFIVEEALKSGIEDILVVTGKSKR 61 

KVRKA+IPAAGLGTRFLPATKALAKEMLPIVDKPTIQFI VEEALKSGIE+ILWTGK+KR 
Sbjct: 3 KVRKAIIPAAGLGTRFLPATKALAKEMLPIVDKPTIQFIVEEALKSGIEEILWTGKAKR 62 

Query: 62 SIEDHFDSNFELEYNLKEKGKNELLKLVDETTGIRLHFIRQSHPRGLGDAVLQAKAFVGN 121 
30 SIEDHFDSNFELEYNL+ KGKNELLKLVDETT I LHFIRQSHPRGLGDAVLQAKAFVGN 

Sbjct: 63 SIEDHFDSNFELEYNLQAKGKNELLKLVDETTAINLHFIRQSHPRGLGDAVLQAKAFVGN 122 

Query: 122 EPFWMLGDDLMDITNNKVIPLTKQLINDFEATHASTIAVMEVPHEDVSAYGVIAPQGEG 181 
EPFWMLGDDLMDITN PLTKQL+ D++ THAST I A VM+VPHEDVS + YGVI APQG+ 
35 Sbjct: 123 EPFWMLGDDLMDITNASAKPLTKQLMEDYDKTHASTIAVMKVPHEDVSSYGVIAPQGKA 182 



40 



Query: 182 VNGLYSVNTFVEKPSPEEAPSNLAIIGRYLLTPEIFNILETQKPGAGNEIQLTDAIDTLN 241 

V GLYSV+TFVEKP PE+APS+LAI IGRYLLTPEIF ILE Q PGAGNE+QLTDAIDTLN 
Sbjct: 183 VKGLYSVDTFVEKPQPEDAPSDLAIIGRYLLTPEIFGILERQTPGAGNEVQLTDAIDTLN 242 

Query: 242 KTQRVFARKFTGDRYDVGDKFGFMKTSIDYALQHPQVKDDLKKYIIDLGKSLEKT 296 

KTQRVFAR+F G+RYDVGDKFGFMKTS IDYAL+HPQVK+DLK YII LGK+LEK+ 
Sbjct: 243 KTQRVFAREFKGNRYDVGDKFGFMKTSIDYALEHPQVKEDLKNYIIKLGKALEKS 297 

45 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 269 

A DNA sequence (GBSx0294) was identified in S.agalactiae <SEQ ID 859> which encodes the amino acid 
sequence <SEQ ID 860>. Analysis of this protein sequence reveals the following: 

50 Possible site: 42 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -4.94 Transmembrane 28 - 44 ( 27 - 45) 

Final Results 

55 bacterial membrane Certainty=0 .2975 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database: 
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>GP:CAB15143 GB:Z99120 similar to ABC transporter (lipoprotein) 
[Bacillus subtilis] 
Identities = 148/346 (42%) , Positives = 222/346 (63%) , Gaps = 16/346 (4%) 



Query: 


31 


LTLLSLSVLTLTACGNRSDKSAN- - -KSDIKVAMVTNQGGVDDKSFNQSAWEGLQKWGKK 


87 






++L+ + L ACGN S + K+ VAMVT+ GGVDDKSFNQSAWEG+Q +GK+ 




Sbj ct : 


1 


MSLVIAAGTILGACGNSEKSSGSGEGKHKFSVAMVTDVGGVDDKSFNQSAWEGIQAFGKE 


60 


Query: 


88 


KGLTKG-NGFDYFQSSNESDHANNLDTAASSGYNLIFGIGFGLHDTIEKVSENNKDVKYV 


146 






GL KG NG+DY QS +++D+ NL+ A ++LI+G+G+ + D+I ++++ K+ + 




Sbjct: 


61 


NGLKKGKNGYDYLQSKSDADYTTNLNKLAREMFDLIYGVGYLMEDSISEIADQRKNTNFA 


120 


Query: 


147 


IVDDIIKGKENVASVTFADNEAAYLAGVAAAKTTKTKTVGFIGGMEGVVVKRFEAGFKAG 


206 






I+D ++ K+NVAS+TF + E ++L GVAAA ++K+ +GF+GGME ++K+FE GF+AG 




Sbj ct : 


121 


I IDAWD - KDNVAS ITFKEQEGSFLVGVAAALSSKSGKIGFVGGMESELIKKFEVGFRAG 


179 


Query: 


207 


VKSIDPAIKVAVSYAGSFTDAAKGKTIAATQYATGVDVIYQAAGGTGAGIFSEAKTENET 


266 






V++++P V V YAG F A GK A + Y +GVDVIY +AG TG G+F+EAK 




Sbjct: 


180 


VQAWPKAvVEVKYAGGFDKADVGKATAESMYKSGVDVIYHSAGATGTGVFTEAK- - -NL 


236 


Query: 


267 


RKESNK- -VWVIGVDRDQSQEGNWSKDGKKANFVLASTIKEVGKSLQSVAELTEKKQYP 


324 






+KE K VWVIGVD+DQ EG +G N L S +K+V ++ V + ++P 




Sbjct: 


237 


KKEDPKRDVWVIGVDKDQYAEGQV- - - EGTDDNVTLTSMVKKVDTVVEDVTKKASDGKFP 


293 


Query: 


325 


GGKVTVFGLKDSGVDI - - KEHQLSSEGSVAVKKAKEDIVSGKIQVP 368 








GG+ +GL GV I + LS + AV K K+ 1+ G +++P 




Sbj ct : 


294 


GGETLTYGLDQDGVG I S PSKQNLSDDVI KAVDKWKKKI IDG - LEI P 338 





There is also homology to SEQ ID 862. 

A related GBS gene <SEQ ID 853 1> and protein <SEQ ID 8532> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: 20 Crend: 3 
Sequence Pattern: CGNR 

SRCFLG : 0 

McG: Length of OR: 19 

Peak Value of UR: 2.31 

Net Charge of CR: 2 
McG: Discrim Score: 5.09 
GvH: Signal Score (-7.5): -3.29 

Possible site: 19 
»> May be a lipoprotein 

Amino Acid Composition: calculated from 21 
ALOM program count: 0 value: 5.20 threshold: 0.0 
PERIPHERAL Likelihood =5.20 90 
modified ALOM score: -1.54 

*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

52.8/73.9% over 239aa 

Listeria 

monocytogenes 

SP|Q48754| CD4+ T CELL-STIMULATING ANTIGEN PRECURSOR. Insert characterized 
GP|724060l|gb|AAB35725.2| |S80336 CD4+ T cell-stimulating antigen Insert characterized 

ORF02225(385 - 1086 of 1710) 

SP|Q48754|TCSA_LISM0(8 - 247 of 268) CD4+ T CELL-STIMULATING ANTIGEN 

PRECURSOR. GP | 7240601 1 gb | AAB35725 . 2 | | S80336 CD4+ T cell-stimulating antigen {Listeria 
monocytogenes } 



WO 02/34771 



PCT/GB01/04789 



10 



25 



-338- 

%Match = 21.7 

%Identity = 52 . 7 %Similarity = 73 . 8 

Matches = 125 Mismatches = 59 Conservative Sub.s = 50 

294 324 354 384 414 444 465 489 

NFLVffiK*NKVC*MIFLCYDR^FLCDYNLLGGSFSVNRKIIGLTLLSLSVLTLTACGNRSD- - -KSANKS- -DIKVAMVT 

: l==: I = I 111= II I =11 I lllll 

MKKRTFALALSMIIASGVILGACGSSSDDKKSSDDKSSKDFTVAMVT 
10 20 30 40 



519 549 579 606 636 666 696 726 

NQGGVDDKSFNQSAWEGLQKWGKKKGLTKG-NGFDYFQSSNESDHANNLDTAASSGYNLIFGIGFGLHDTIEKVSENNKD 

' IIIIMIIIIIIIIIIMI = || =|: = |:||::|:|: Ihll I I = I I = I I I = I I I I = I I = 
DTGGVDDRSFNQSAWEGLQKFGKANDMEKGTDGYmLQSASEADYKTOLOTAWSDYDLIYGIGYKLKDAIEEVSKQKPK 
15 60 70 80 90 100 110 120 

756 786 816 846 876 906 936 966 

VKYVIVDDIIKGKENVASOTFAIJNEAAV 

== llll I :=ll 1= I II: =11 II I llll h llllll lll|:==l == I II 

20 NQFAIVDDTIDDRDNWSIGFKDNDGSYLVGOTAGLTTKTNKVGFVGGVKGTVIDRFEAGFTAGVKAVNPNAQIDVQYAN 
140 150 160 170 180 190 200 



996 1026 1056 1086 1116 1146 1176 1206 

SFTDAAKGKTIAATQYATGVDVIYQAAGGTGAGIFSEAKTENETRKESNKOTWIGVDRDQSQEGNYVSKDGKKANFVIA.S 

I I ||: Ih: I = = I I I I I = = I I I I I I |:|:||| = = 
DFAKADKGQQIASSMYSSGVDVIFHAAGGTGNGVFAEAKNLKKKDLQMVPYGNSKLGCFGG 
220 230 240 250 260 



A related GBS nucleic acid sequence <SEQ ID 10947> which encodes amino acid sequence <SEQ ID 
30 10948> was also identified. 

SEQ ID 8532 (GBS 108) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 38 (lane 7; MW 39.6kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 41 (lane 9; MW 64.6kDa). 

The GBS108-GST fusion product was purified (Figure 202, lane 9) and used to immunise mice. The 
35 resulting antiserum was used for FACS (Figure 273), which confirmed that the protein is immunoaccessible 
on GBS bacteria. 

Based on this analysis, it was predicted tirat these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 270 

40 A DNA sequence (GBSx0295) was identified in S.agalactiae <SEQ ID 863> which encodes the amino acid 
sequence <SEQ ID 864>. Analysis of this protein sequence reveals the following: 



Possible site: 35 

>>> Seems to have a cleavable N-term signal seq. 



INTEGRAL 


Likelihood 


=-12. 


.74 


Transmembrane 


206 


- 222 


( 197 


- 224) 


INTEGRAL 


Likelihood 


= -3. 


.72 


Transmembrane 


174 


- 190 


( 171 


- 194) 


INTEGRAL 


Likelihood 


= -3, 


.19 


Transmembrane 


98 


- 114 


( 98 


- 116) 


INTEGRAL 


Likelihood 


= -1. 


.54 


Transmembrane 


120 


- 136 


( 120 


- 139) 


INTEGRAL 


Likelihood 


= -0. 


.90 


Transmembrane 


157 


- 173 


( 157 


- 173) 



45 



50 Final Results 

bacterial membrane Certainty=0. 6 095 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



55 The protein has homology with the following sequences in the GENPEPT database: 
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>GP:CAB90755 GB:AJ400707 hypothetical protein [Streptococcus uberis] 
Identities = 126/218 (57%) , Positives = 166/218 (75%) 

Query: 8 KEYPTTVLLVSLTTLVFLLMQLTYGSQAESSQVIFQFGGIQGDYLKAYPTNLWRLISPIF 67 
5 KE P T +S+T L+F++MQ+ YGS A+S QV+FQFGG+ G +K+ P+ LWRL++PIF 

Sbjct: 5 KEKPVTFFFLSvTILLFIVMQVFYGSWAKSPQWFQFGGMFGLWKSMPSQLWRLvTPIF 64 

Query: 68 VHIGWEHFLLNGLALYFVGQMGES IWGSLRFLILYILSGLMGNI FTLFFTPHWA&GAST 127 
+HIGWEHFL+N L LYFVGQ+ ESIWGS FL+LY+LSG+MGN+ TLFFTPHWAAGAST 
10 Sbjct: 65 IHIGWEHFLINSLTLYFVGQIAESIWGSRFFLLLYVLSGIMGNVLTLFFTPHWAAGAST 124 

Query: 128 SLFGVFSAIAIAGYFGKNPYLKQVGKSYQVMILLNLFFNIFTPGVSLAGHVGGLVGGVLV 187 

SLFG+F+AI + GYFG N LK +GKSYQ +I+LNL N+F P V + GH+GG +GG L 
Sbjct: 125 SLFGLFAAIVWGYFGHNQLLKSIGKSYQTLIIIiNLvMNLFMPNVGIVGHLGGALGGALA 184 



15 



Query: 188 AI FLTKQNGSLLFKTWQSILALMI FI I VSISLIGLSLV 225 

A+FL + LF Q AL+ ++ +++ LI LSL+ 

Sbjct: 185 AVFLPTLLDAELFTKKQKTSALLSYLTLALVLITLSLM 222 



20 A related DNA sequence was identified in S.pyogenes <SEQ ID 865> which encodes the amino acid 
sequence <SEQ ID 866>. Analysis of this protein sequence reveals the following: 



Possible site: 43 
»> Seems to have no N-terminal signal sequence 



INTEGRAL 


Likelihood = 


-9. 


.92 


Transmembrane 


214 


- 230 


( 212 


- 232) 


INTEGRAL 


Likelihood = 


-5, 


.36 


Transmembrane 


135 


- 151 


( 128 


- 153) 


INTEGRAL 


Likelihood = 


-1. 


.81 


Transmembrane 


101 


- 117 


( 100 


- 117) 


INTEGRAL 


Likelihood = 


-1. 


.44 


Transmembrane 


183 


- 199 


( 182 


- 199) 


INTEGRAL 


Likelihood = 


-0. 


.53 


Transmembrane 


166 


- 182 


( 166 


- 182) 
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30 Final Results 

bacterial membrane — Certainty=0. 4970 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < suco 

35 The protein has homology with the following sequences in the databases: 

>GP:CAB90755 GB.-AJ400707 hypothetical protein [Streptococcus uberis] 
Identities = 72/128 (56%) , Positives = 94/128 (73%) 

Query: 106 FLLLYVLSGVMGNAFTFWLTPETVARGASTSLFGLFAAIVVLSFLGKNQALKDLGKSYQT 165 
40 FLLLYVLSG+MGN T + TP VAAGASTSLFGLFAAIW+ + G NQ LK +GKSYQT 

Sbjct: 95 FLLLYVLSGIMGNVLTLFFTPHWAAGASTSLFGLFAAIVWGYFGHNQLLKSIGKSYQT 154 

Query: 166 LI VVNLLMNLFMPNVS^GHIGGWGGALLSIVFPTKl^VIWKKTKRMLALVSYGIILV 225 
LI++NL+MNLFMPNV + GH+GG +GGAL ++ PT + K ++ AL+SY + + 

45 Sbjct: 155 LIILNLVMNLFMPNVGIVGHLGGALGGAIAAVFLPTLLDAELFTKKQKTSALLSYLTLAL 214 

Query: 226 GVLVLGFL 233 

++ L + 
Sbjct: 215 VLITLSLM 222 

50 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 63/132 (47%) , Positives = 92/132 (68%) 

Query: 94 GSLRFLILYILSGLMGNIFTLFFTPHWftAGASTSLFGVFSAIAIAGYFGKNPYLKQVGK 153 
55 G FL+LY+LSG+MGN FT + TP VAAGASTSLFG+F+AI + + GKN LK +GK 

Sbjct: 102 GLTPFLLLYVLSGvMGNAFTFVfljTPETV7AAGASTSLFGLFAAIVVLSFLGKNQALKDLGK 161 

Query: 154 SYQVMILI^FFNIFTPGVSLAGHVGGLVGGVnVAIFLTKQNGSLLFKTWQSIIiALMIFI 213 
SYQ +I++NL N+F P VS+AGH+GG+VGG L++I + + K + +LAL+ + 
60 Sbjct: 162 SYQTLIVVNLLMNLFMPNVSMAGHIGGVVGGALLSIWPTKMRVIWKKTKRMLALVSYG 221 

Query: 214 IVSISLIGLSLV 225 

1+ + ++ L + 
Sbjct: 222 I ILVG VLVLGFL 233 
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A further corresponding DNA sequence was identified in S.pyogenes <8EQ ID 9083> which encodes the 
amino acid sequence <SEQ ID 9084>. Analysis of this protein sequence reveals the following: 

Possible site: 52 

5 

>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -7.70 Transmembrane 12 - 28( 7- 30) 

Final Results 

10 bacterial membrane Certainty=0 .4079 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS sequences follows: 

15 Score = 74.5 bits (180), Expect = 5e-16 

Identities = 37/96 (38%) , Positives = 48/96 (49%) 

Query: 1 MTQLLKRYPXXXXXXXXXXXXXXAMQVVYGHLATGAQAIYQVGGMFGLLVKAMPDQLWRL 60 
M + K YP MQ+ YG A +Q I+Q GG+ G +KA P LWRL 

20 Sbjct: 3 MKKFAKEYPTTVLLVSLTTLVFLLMQLTYGSQAESSQVIFQFGGIQGDYLKAYPTNLWRL 62 

Query: 61 VTPXXXXXXXXXXXVNGLTLYFVGQIVEDLWGSRLF 96 

++P +NGL LYFVGQ+ E +WGS F 

Sbjct: 63 I S P I FVH IGWEHFLLNGLALYFVGQMGESIWGSLRF 98 

25 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 271 

A DNA sequence (GBSx0296) was identified in S.agcdactiae <SEQ ID 867> which encodes the amino acid 
30 sequence <SEQ ID 868>. Analysis of this protein sequence reveals the following: 

Possible site: 22 

>>> Seems to have no N-terminal signal sequence 

Final Results 

35 bacterial cytoplasm Certainty=0 . 2055 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

40 >GP:BAA28715 GB:AB001562 hypothetical protein [Streptococcus mutans] 

Identities = 96/173 (55%) , Positives = 129/173 (74%) 

Query: 1 MEKKLLRKEVLITLKSQPQAYKSEVDCKLLEAFIKTKAYQNSCVIATYLSFDYEYNTQLL 60 
M KK R +V+ LK Q +A K D +LLE 1+ +AYQ + VIATYD+F +E++T LL 
45 Sbjct: 1 MMKKDYRTQVIEDLKKQDKAKKVLRDEQLLEELIQLEAYQKAHVIATYLAFPFEFDTSLL 60 

Query: 61 IKQALCDGKRVLVPKTYPKGKMIFVnYQKDNIjRTTPFGLLEPVNDRAVEKASIDLIHVPG 120 

I+QA D K ++VPKTYP+ KMIFV Y + +L+ T FGL EP ++ A+EK++IDLIHVPG 
Sbjct: 61 IEQAQRDNKSIWPKTYPQRKMIFWYDEADLQITKFGLKEPRSEEALEKSAIDLIHVPG 120 



50 



Query: 121 LIFNNKGFRIGYGAGYFDRYLSDFEGDTISTIYRCQRQDFVEEKHDVAVKEVL 173 

L FNN+G+RIG+GAGY+D+YL+DF+GDT+STIY Q+ F D+ VKEVL 

Sbjct: 121 LAFNNEGYRIGFGAGYYDQYLADFQGDTVSTIYSFQQFTFEPSFFDIPVKEVL 173 



55 



A related GBS nucleic acid sequence <SEQ ID 10925> which encodes amino acid sequence <SEQ ID 
10926> was also identified. 



WO 02/34771 



PCT/GB01/04789 



-341- 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 272 

5 A DNA sequence (GBSx0297) was identified in S.agalactiae <SEQ ID 869> which encodes the amino acid 
sequence <SEQ ID 870>. Analysis of this protein sequence reveals the following: 

Possible site: 43 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.44 Transmembrane 161 - 177 ( 161 - 177) 
10 INTEGRAL Likelihood = -0.22 Transmembrane 29 - 45 ( 28 - 45) 

Final Results 

bacterial membrane Certainty=0 . 1574 (Affirmative) <: suco 

bacterial outside Certainty=0 .0000 (Not Clear) < suco 

15 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9305> which encodes amino acid sequence <SEQ ID 9306> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

20 >GP:AAD33517 GB:AF132127 glucose-6-phosphate isotnerase 

[Streptococcus mutans] 
Identities = 344/401 (85%) , Positives = 374/401 (92%) 

Query: 1 MDLPENYDKEEFSRIQKAAEKIKSDSEvLWIGIGGSYLGAKAAIDFIMaHFANLQTAEE 60 
25 ++LP+NyDKEEF+RI+KAAEKIKSDSEVLWIGIGGSYLGA+AAIDFLN+ F NL+ EE 

Sbjct: 49 IiNLPQNYDKEEFARIKKAAEKIKSDSEVLWIGIGGSYLl^^ 108 

Query: 61 RKAPQILYAGNSISSTYLADLVEWQDKEFSVWISKSGTTTEPAIAFRVFKELLVKKYG 120 
RKAPQILYAGNSISS YLADLV+YV DK+FSVNVISKSGTTTEPAIAFRVFK+LLVKKYG 
30 Sbjct: 109 RKAPQILYAGNSISSNYLADLVDYVADKDFSVOTISKSGTTTEPAIAFRVFKDLLVKKYG 168 

Query: 121 QEFJUJKRIYATTDKVKGAVKVEADANNWETFWPDNVGGRFSVLTAVGLLPIAASGADIT 180 

QEEAN+RIYATTD+VKGAVKVEADAN WETFWPD+VGGRF+VLTAVGLLPIAASGAD+ 
Sbjct: 169 QEEANQRIYATTDRVKGAVKVEADANGWETFVVPDSVGGRFTVLTAVGLLPIAASGADLD 228 

35 

Query: 181 ALMEGANAARKDLSSDKISENIAYQYAAVRNVLYRKGYITEILANYEPSLQYFGEWWKQL 240 

LM GA AAR+D SS ++SEN AYQYAA+RN+LYRKGY+TE+LANYEPSLQYF EWWKQL 
Sbjct: 229 QLMAGAFAARQDYSSAELSENEAYQYAAIRNILYRKGY\7TEVLANYEPSLQYFSEWWKQL 288 

40 Query: 241 AGESEGKDQKGIYPTSANFSTDLHSLGQFIQEGYRNLFETVWVEKPRKNVTIPELTEDL 300 

AGESEGKDQKGIYPTSANFSTDLHSLGQFIQEG RNLFETV+RVEK RKN+ +PE EDL 
Sbjct: 289 AGESEGKDQKGIYPTSANFSTDLHSLGQFIQEGNRNLFETVIRVEKARKNILVPEAAEDL 348 

Query: 301 DGLGYLCGKDVDFVNKKATDGVLIJfflTDGGVPNMFVTLPTQDAYTLGYTIYFFELAIGLS 360 
45 DGL YLQGKDVDFVNKKATDGVLLAHTDGGVPN F+T+P QD +TLGY IYFFELAIGLS 

Sbjct: 349 DGIAYLQGKDVDFVNKKATDGVLLAHTDGGTOOT 408 

Query: 361 GYLNSVNPFDQPGVEAYKRNMFALLGKPGFEELSAELNARL 401 
GYLN VNPFDQPGVEAYK+NMFALLGKPGFEEL AELNARL 
50 Sbjct: 409 GYLNGVNPFDQPGVEAYKKNMFALLGKPGFEELGAELNARL 449 

A related DNA sequence was identified in S.pyogenes <SEQ ID 871> which encodes the amino acid 
sequence <SEQ ID 872>. Analysis of this protein sequence reveals the following: 

Possible site: 31 



55 



»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.44 Transmembrane 209 - 225 ( 209 - 225) 
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INTEGRAL Likelihood = -0.22 Transmembrane 77 - 93 ( 76 - 93) 

Final Results 

bacterial membrane Certainty=0. 1574 (Affirmative) < suco 

5 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAD33517 GB:AF132127 glucose -6 -phosphate isomerase 
10 [Streptococcus mutans] 

Identities = 369/449 (82%) , Positives = 408/449 (90%) 

Query: 1 MSHITFDYSKVLESFAGQHEIDFLQGQVTEADKLLREGTGPGSDFLGWLDLPENYDKDEF 60 
M+HI FDYSKVL F HE+D++Q OVT AD+ LR+GTGPG++ GWL+LP+NYDK+EF 
15 Sbjct: 1 MTHIKFDYSKTOGKFIASHELDYIQMQVTAADEALRKGTGPGAEMrGWIaNLPQNYDKEEF 60 

Query: 61 ARI LTAAEKI KADSE VLWIGIGGS YLGAKAAI DFLNHHFANLQTAKERKAPQI LYAGNS 120 

ARI AAEKIK+DSEVLWIGIGGSYLGA+AAIDFLN F NL+ +ERKAPQILYAGNS 
Sbjct: 61 ARIKKAAEKIKSDSEVLWIGIGGSYLGARAAIDFLNSSFVNLENKEERKAPQILYAGNS 120 

20 

Query: 121 1SSTYLADLVEYVQDKEFSVNVISKSGTTTEPAIAFRVFKELLVKKYGQEEANKRIYATT 180 

ISS YLADLV+YV DK+FSVNVISKSGTTTEPAIAFRVFK+LLVKKYGQEEAN+RIYATT 
Sbjct: 121 ISSNYLADLVDYVADKDFSVNVISKSGTTTEPAIAFRVFKDLLVKKYGQEEANQRIYATT 180 

25 Query: 181 DKVKGAVKvFJffiANNWETFWPDNVGGRFSVLTAVGLLPIAASGADITALMEGANAARKD 240 

D+WGAVKVEADAN WETFWPD+VGGRF+VLTAVGLLPIAASGAD+ LM GA AAR+D 
Sbjct: 181 DRvKGAVKVEADANGWETFWPDSVGGRFTVLTAVGLLPIAASGADLDQLMAGAEAARQD 240 

Query: 241 LSSDKISENIAYQYAAVRNVLYRKGYITEILANYEPSLQYFGEWWKQLAGESEGKDQKGI 300 
30 SS ++SEN AYQYAA+RN+LYRKGY+TE+LANYEPSLQYF EWWKQLAGESEGKDQKGI 

Sbjct: 241 YSSAELSENEAYQYAAIRNILYRKGYVTEVLANYEPSLQYFSEWWKQLAGESEGKDQKGI 300 

Query: 301 YPTSANFSTDLHSLGQFIQEGXKNLFETrc^ 360 
YPTSANFSTDLHSLGQFIQEG RNLFETVIRV+ RKN+++PE AEDLDGL YLQGKDVD 
35 Sbjct: 301 YPTSANFSTDLHSLGQFIQEGNRNLFETVIRVEKARKNILVPEAAEDLDGLAYLQGKDVD 360 

Query: 361 FVNKKATDGVLLAHTDGGVPNMFVTLPAQDEFTLGYTIYFFEIAIAVSGYMNAVNPFDQP 420 

FVNKKATDGVLIiAHTDGGVPN F+T+P QDEFTLGY IYFFELAI +SGY+N VNPFDQP 
Sbjct: 361 FVNKKATDGVLLAHTDGGVPNTFLTIPEQDEFTLGYVIYFFELAIGLSGYLNGVNPFDQP 420 



40 

Query: 421 GVEAYKRNMFALLGKPGFEALSAELNARL 449 

GVEAYK+NMFALLGKPGFE L AELNARL 
Sbjct: 421 GVEAYKKNMFALLGKPGFEELGAELNARL 449 

45 The protein has homology with the following sequences in the databases: 

>GP:CAB90755 GB:AJ400707 hypothetical protein [Streptococcus 
uberis] 

Identities = 58/91 (63%) , Positives = 69/91 (75%) 

50 Query: 6 KRYPI TI FLLGLTGLI FI AMQWYGHLATGAQAI YQVGGMFGLLVKAMPDQLWRL VTPI F 65 

K P+T F L +T L+FI MQV YG A Q ++Q GGMFGL+VK+MP QLWRLVTPIF 
Sbjct: 5 KEKPVTFFFLSVTILLFIVMQVFYGSWAKSPQWFQFGGMFGLVVKSMPSQLWRLVTPIF 64 

Query: 66 IHIGFGHFFVNGLTLYFVGQIVEDLWGSRLF 96 
55 IHIG+ HF +N LTLYFVGQ+ E +WGSR F 

Sbjct: 65 IHIGWEHFLINSLTLYFVGQLAES IWGSRFF 95 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 380/401 (94%) , Positives = 392/401 (96%) 



60 



Query: 1 KDLPENYDKEEFSRIQKAREKIKSDSEvnWIGIGGSYLGAKARIDFLNNHFANLQTAEE 60 

+DLPENYDK+EF+RI AAEKIK+DSEVLWIGIGGSYLGAKAAIDFLN+HFANLQTA+E 
Sbjct: 49 LDLPENYDKDEFARILTAAEKIKADSEVLWIGIGGSYLGAKAAIDFLNHHFANLQTAKE 108 
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Query: 61 RKAPQILYAGNSISSTYIiADLVEYVQDKEPSVMVISKSGTTTEPAIAFRVFKELLVKKYG 120 

RKAPQILYAGNSISSTYIADLVEYVQDKEFSVlWISKSGTTTEPAIAFRVFKELLVKKyG 
Sbjct: 109 RKAPQILYAGNSISSTYLADLVEYVQDKEFSVNVISKSGTTTEPAIAFRVFKELLA7KKYG 168 

Query: 121 QEEANKRIYATTDKVKGAVKVFADANNWETFWPDOTGGRFSVLTAVGLLPIAASGADIT 180 

QEFJOTKRIYATTDKVKGAVKVFADANNWETFWPDNVGGRFSVLTAVGLLPIAASGADIT 
Sbjct: 169 QEE&NKRIYATTDKVKGAVKVEADANNWETFW 228 

Query: 181 ALMEGANAARKDLSSDKISENIAYQYAAVRNVLYRKGYITEILANYEPSLQYFGEWWKQL 240 

ALMEGANAARKDLSSDKISENIAYQYA&VRtmiYRKGYITEIIANYEPSLQYFGEWWKQL 
Sbjct: 229 ALMEGANAARKDLSSDKISENIAYQYAAVRNVLYRKGYITEILANYEPSLQYFGEWWKQL 288 

Query: 241 AGESEGKDQKGIYPTSANFSTDLHSMQFIQEGYRNLFETVVRVEKPRKNVTIPELTEDL 300 

AGESEGKDQKGIYPTSANFSTDLHSLGQFIQEGYRNLFETV+RV+ PRKNV IPEL EDL 
Sbjct: 289 AGESEGKDQKGIYPTSANFSTDLHSLGQF1QEGYRNLFETVIRVDNPRKNVIIPELAEDL 348 

Query: 301 DGLGYLQGKDVDFWKKATDGVLIAHTDGGVPNMFVTLPTQDAYTLGYTIYFFELAIGLS 360 

DGLGYLQGKDVDFVNKKATDGVLIAHTDGGVPNMFVTLP QD +TLGYTIYFFELAI +S 
Sbjct: 349 DGLGYLQGKDVDFVNKKATDGVLIAHTDGGVPNMFVTIjPAQDEFTLGYTIYFFEIiAIAVS 408 

Query: 361 GYUStSVNPFDQPGVEAYKRNMFALLGKPGFEELSAELNARL 401 

GY+N+VNPFDQPGVEAYKRNMFALLGKPGFE LSAELNARL 
Sbjct: 409 GYmAVNPFDQPGVEAYKRNMFALU3KPGFEALSAELMARL 449 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 273 

A DNA sequence (GBSx0298) was identified in S.agalactiae <SEQ ID 873> which encodes the amino acid 
sequence <SEQ ID 874>. Analysis of this protein sequence reveals the following: 

Possible site: 38 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.66 Transmembrane 654 - 670 { 653 - 671) 
INTEGRAL Likelihood = -1.65 Transmembrane 113 - 129 ( 113 - 129) 

Final Results 

bacterial membrane --- Certainty=0. 2062 (Affirmative) < suco 
bacterial outside --- Certainty=0 .0000 (Not Clear) < suco 
bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9463> which encodes amino acid sequence <SEQ ID 9464> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAA81906 GB:U04863 alcohol dehydrogenase 2 [Entamoeba 
histolytica] 

Identities = 536/864 (62%), Positives = 663/864 (76%), Gaps = 3/864 (0%) 

Query: 20 ETTDVAIAIDTLVQNGLKALDEMR--QLNQEQVDYIVAJ<ASVAALDAHGELALHAVEETG 77 

+T V 1+ LV+ AL E + QE++DYIV KASVAALD H LA AVEETG 
Sbjct: 5 QTMTVDEHINQLTOKAQVALKEYLKPEYTQEKIDYIVKKASVAALDQHCAIAAAAVEETG 64 

Query: 78 RGVFEDKATKNLFACSHvVNNMRHTKTVGVIEEDDVTGLTLIAEPVGVVCGITPTTNPTS 137 

RG+FEDKATKN+FACEHV + MRH KTVG+I D + G+T IAEPVGWCG+TP TNPTS 
Sbjct: 65 RGIFEDKATKNIFACEHVTHEMRHAKWGIINVDPLYGITEIAEPVGWCGVTPVTNPTS 124 

Query: 138 TAIFKSLISLKTRNPIIFAFHPSAQESSAHAftRIVRnAAIAAGAPENCVQWIEQPSIDAT 197 

TAIFKSLIS+KTRNPI+F+FHPSA + S AA+IVRDAAIAAGAPENC+QWIE Z+A+ 
Sbjct: 125 TAIFKSLISIKTRNPIVFSFHPSALKCSIMftAKIVRDAAIAAGAPENCIQWIEFGGIEAS 184 

Query: 198 NALM^DGIATIIATGGNAMVKAAYSCGKPAIXSVGAGNVPAYVEKSANIRQAAHDIVMSK 257 
N LMNH G+ATILATGGNAMVKAAYS GKPALGVGAGNVP Y+EK+ NI+QAA+D+VMSK 
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Sbjct: 185 NKLMNHPGVATIIATGGNAWKAAYSSGKPJUjGVGAGNVPTYIEKTCNIKQAANDVVMSK 244 

Query: 258 SFDNGMVCASEQAVIIDKEIYKEFVEEFKSYHTYFVNKKEKALLEEFCFGAKANSKNCAG 317 

SFDNGM+CASEQA IIDKEIY + VEE K+ YF+N++EKA LE+F FG A S + 
Sbjct: 245 SFDNGMICASEQAAIIDKEIYIX3VVEEMKTLGAYFINEEEKAKLEKFMFGVNAYSADVNN 304 

Query: 318 AKLNPNIVGKSAVWIAEQAGFTVPEGTNILAAECTEVSEKEPLTREKLSPVIAVLKAEST 377 

A+LNP G S W AEQ G VPE NI+ A C EV EPLTREKLSPV+A+LKAE+T 
Sbjct: 305 ARIJSTPKCPGMSPQWFAEQVGIKVPEDCNIICAVCKEVGPNEPLTREKLSPVIiAILKAENT 364 

Query: 378 EDGVEKARQMVEFNGLGHSAAIHTKDADLAREFGTRIRAIRVIWNSPSTFGGIGDVYNAF 437 

+DG++KA MVEFNG GHSAAIH+ D + ++ ++A R++ N+PS+ GGIG +YN 
Sbjct: 365 QDGIDKAEAMVEFNGRGHSAAIHSNDKAVVEKYALTMKACRILHNTPSSQGGIGSIYNYI 424 

15 Query: 438 LPSLTLGCGSYGRNSVGDNVSAINLLNIKKVGRRRNNMQWFKVPSKTYFERDSIQYLQKC 497 

PS TLGCGSYG NSV NV+ NLLNIK++ RRNN+QWF+VP K +FE SI+YL + 
Sbjct: 425 WPSFTLGCGSYGGNSVSAIsrVTYHNLLNIKRLADRRNNLQWFRVPPKIFFEPHSIRYLAEL 484 

Query: 498 RDVERVMIVTDHAMVELGFLDRIIEQLDLRRNKVVYQIFAEVEPDPDITTVMKGTDLMRT 557 
20 +++ ++ IV+D M +LG++DR+++ L R N+V +IF +VEPDP I TV KG +M T 

Sbjct: 485 KELSKIFIVSDRM^KLGYVDRVMDVLKRRSNEVEIEIFIDVEPDPSIQTVQKGIAVMOT 544 

Query: 558 FKPDTIIALGGGSPMDAAKVMWLFYEQPEVDFHDLVQKFMDIRKRAFKFPELGKKTKFVA 617 
F PD IIA+GGGS MDAAK+MWL YE PE DF + QKF+D+RKRAFKFP +GKK + + 
25 Sbjct: 545 FGPDNIIAIGGGSAMDAAKIMWLLYEHPEADFFAMKQKFIDLRKRAFKFPTMGKKARLIC 604 

Query: 618 IPTTSGTGSEVTPFAVISDKANNRKYPIADYSLTPTVAIVDPALVMTVPGFIAADTGMDV 677 

I PTTSGTGSEVTPFAVI SD +KYP+ADYSLTP+VAIVDP M++P ADTG+DV 
Sbjct: 605 IPTTSGTGSEVTPFAVISDHETGKKYPIADYSLTPSVAIVDPMFTMSLPKRAIADTGLDV 664 

30 

Query: 678 LTHATEAWSQMAiroYTDGLALQAIKlVFDYLERSVKDADFEAREKMHNASTMAGMAFAN 737 

L HATEAYVS MAN+YTDGLA +A+K+VF+ L +S + D EAREKMHNA+T+AGMAFA+ 
Sbjct: 665 LVHATEAYVSVMftNEYTDGLAREAVKLVFENLLKSY-NGDLEAREKMHNAATIAGMAFAS 723 

35 Query: 738 AFLGISHSMftHKIGAQFHTVHGRTNAILLPYVIRYNGTRPAKTATWPKYNYYRADEKYQD 797 

AFLG+ HSMAHK+GA FH HGR A+LLP+VIRYNG +P K A WPKYN+Y+AD++Y + 
Sbjct:, 724 AFLGMDHSMAHKVGAAFHLPHGRCTAVLLPHVIRYNGQKPRKIJ^MWPKYNFYKADQRYME 783 

Query: 798 IAKLLGLPAATPEEAVESYAKAVYDLGTRLGIKMNFRDQGIDEKEWKEKSRELAFLAYED 857 
40 +A+++GL TP E VE++AKA +L F+ IDE W K E+A LA+ED 

Sbjct: 784 IAQWGLKCOTPAEGVEAFAKACEELMKATETITGFKKftNIDEAAWMSKVPEMALLAFED 843 

Query: 858 QCSPANPRLPMVDHMQEI IEDAYY 881 
QCSPANPR+PMV M++I++ AYY 
45 Sbjct: 844 QCSPANPRVPMVKDMEKILKAAYY 867 

A related DNA sequence was identified in S.pyogenes <SEQ ID 87 5> which encodes the amino acid 
sequence <SEQ ID 876>. Analysis of this protein sequence reveals the following: 

Possible site: 55 

50 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -3.66 Transmembrane 643 - 659 ( 642 - 660) 
INTEGRAL Likelihood = -1.81 Transmembrane 102 - 118 ( 102 - 118) 

55 Final Results 

bacterial membrane Certainty=0, 24 6 6 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

60 The protein has homology with the following sequences in the databases: 

>GP:AAA81906 GB:U04863 alcohol dehydrogenase 2 [Entamoeba 
histolytica] 

Identities = 535/870 (61%) , Positives = 669/870 (76%) , Gaps = 3/870 (0%) 



65 Query: 6 NTVETTSVSVTIDALVQKGLAALEEMRKLD - - QEQVDYIVAKASVAALDAHGELAKHAYE 63 
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+T +T +V 1+ LV+K AL+E K + QE++DYIV KASVAALD H LA A E 
Sbjct: 2 STQQTMTVDEHINQLTOKAQVALKEYLKPEYTQEKIDYIWKASVAALDQHCALAAAAVE 61 

Query: 64 ETGRGVFEDKATKHLFACEHWNNMRHQKTVGIIEEDDVTGLTLIAEPVGVICGITPTTN 123 
5 ETGRG+ FEDKATK+ + FACEHV + MRH KTVGII D + G+T IAEPVGV+CG+TP TN 

Sbjct: 62 ETGRGIFEDKATKNIFACEHVTHEMRHAKTVGIINVDPLYGITEIAEPVGWCGVTPVTN 121 

Query: 124 PTSTAIFKSLISLKTRNPIIFAFHPSAQESSAHAARIWDAAIAAGAPENCVQWVETPSL 183 
PTSTAIFKSLIS+KTRNPI+F+FHPSA + S AA+IVRDAAIAAGAPENC+QW+E + 
10 Sbjct: 122 PTSTAIFKSLISIKTRNPIVFSFHPSALKCSIMAAKIVRDAAIAAGAPENCIQWIEFGGI 181 

Query: 184 FATNALMNHDGIATILATGGNAMVKAAYSCGKPALGVGAGNVPAYVEKSANIRQAAHDIV 243 

EA+N LMNH G+ATILATGGNAMVKAAYS GKPALGVGAGNVP Y+EK+ NI+QAA+D+V 
Sbjct: 182 EASNKLMIfflPGVATIIiATGGNAMVKAAYSSGKPALGVGAGNVPTYIEKTCNIKQAANDVV 241 

15 

Query: 244 MSKSFDNGMVCASEQAVIIDKEIYDDFVAEFKSYHTYFVNKKEKALLEEFCFGAKANSKN 303 

MSKSFDNGM+CASEQA IIDKEIYD V E K+ YF+N++EKA LE+F FG A S + 
Sbjct: 242 MSKSFDNGMICASEQAAIIDKEIYDQWEEMKTLGAYFINEEEKAKLEKFMFGVNAYSAD 301 

20 Query: 304 CAGAKLNPNIVGKPATWIAEQAGFTVPEGTNILAAECKEVSENEPLTREKLSPVIAVLKS 363 

A+LNP G W AEQ G VPE NI+ A CKEV NEPLTREKLSPV+A+LK+ 
Sbjct: 302 VNNARLNPKCPGMS PQWFAEQVG I KVPEDCNI I CAVCKEVGPNEPLTREKLS PVLAILKA 361 

Query: 364 ESREDGVEKARQMVEFNGLGHSAAIHTADAELAKEFGTRIRAIRVIWNSPSTFGGIGDVY 423 
25 E+ +DG++KA MVEFNG GHSAAIH+ D + +++ ++A R++ N+PS+ GGIG +Y 

Sbjct: 362 ENTQDGIDKAEAMVEFNGRGHSAAIHSNDKAWEKYALTMKACRILHNTPSSQGGIGSIY 421 

Query: 424 NAFLPSLTLGCGSYGRNAVGDNVSAINLLNIKKVGRRRNNMQWFKVPSKTYFERDSIQYL 483 
N PS TLGCGSYG N+V NV+ NLLNIK++ RRNN+QWF+VP K +FE SI+YL 
30 Sbjct: 422 NYIWPS FTLGCGS YGGNS VSANVTYHNLLNI KRLADRRNNLQWFRVPPKI FFEPHS I RYL 481 

Query: 484 QKCRDVERVMIVTDHAMVELGFLDRIIEQLDLRRNKVVYQIFAEVEPDPDITTVMKGTEL 543 

+ +++ ++ IV+D M +LG++DR+++ L R N+V +IF +VEPDP I TV KG + 
Sbjct: 482 AELKELSKIFIVSDRMMYKLGYVDRVMDVLKRRSNEVEIEIFIDVEPDPSIQTVQKGIiAV 541 

35 

Query: 544 MRTFKPDTIIALGGGSPMDAAKVMWLFYEQPEVDFHDLVQKFMDIRKRAFKFPELGKKTK 603 

M TF PD IIA+GGGS MDAAK+MWL YE PE DF + QKF+D+RKRAFKFP +GKK + 
Sbjct: 542 MNTFGPDNIIAIGGGSAMDAAKIMWLLYEHPFADFFAMKQKFIDLRKRAFKFPTMGKKAR 601 

40 Query: 604 FVAIPTTSGTGSEVTPFAVISDKANNRKYPIADYSLTPTVAIVDPALVLTVPGFIAADTG 663 

+ IPTTSGTGSEVTPFAVISD +KYP+ADYSLTP+VAIVDP +++P ADTG 
Sbjct: 602 LICIPTTSGTGSEVTPFAVISDHETGKKYPLADYSLTPSVAIVDPMFTMSLPKRAIADTG 661 

Query: 664 ^VLTHATFAYVSQMANDFTDGLALQAIKIVFDNLEKSVKTADFEAREKMHNAST^GMA 723 
45 +DVL HATEAYVS MAN++TDGLA +A+K+VF+NL KS D EAREKMHNA+T+AGMA 

Sbjct: 662 LDVLVHATEAYVSVMANEYTDGI^EAVKLVFENLLKSY-NGDLEAREKMHNAATIAG^ 720 

Query: 724 FANAFLGISHSMAHKIGAQFHTVHGRTNAILLPWIRYNGTRPAKTATWPKYNYYRADEK 783 
FA+AFLG+ HSMAHK+GA FH HGR A+LLP+VIRYNG +P K A WPKYN+Y+AD++ 
50 Sbjct: 721 FASAFLGMDHSMAHKVGAAFHLPHGRCVAWJLPHVIRYNGQKPRKIJyyroPKYNFYKADQR 780 

Query: 784 YQDIAKLLGLPASTPEEAVESYAKAVYDLGCRVGIQMNFKAQGIDENEWKEHSRELAYLA 843 

Y ++A+++GL +TP E VE++AKA +L FK IDE W E+A LA 

Sbjct: 781 YMEIAQMVGLKC!NTPAEGVEAFAKACEEI^KATETITGFKKANIDEAAVMSKVPEMALLA 840 

55 

Query: 844 YEDQCSPANPRLPMVDHMQE I I EDAYYGYA 873 

+EDQCSPANPR+PMV M++I++ AYY A 
Sbjct: 841 FEDQCSPANPRVPMVKDMEKILKAAYYPIA 870 

60 An alignment of the GAS and GBS proteins is shown below: 

Identities = 827/880 (93%) , Positives = 852/880 (95%) 

Query: 12 MTEKTKAVETTDVALAIDTLVQNGLKALDEMRQLNQEQVDYIVAKASVAALDAHGEIALH 71 
MTE VETT V++ ID LVQ GL AL+EMR+L+QEQVDYIVAKASVAALDAHGELA H 
65 Sbjct: 1 MTEGHNTVETTSVSVTIDALVQKGLAALEEMRKLDQEQVDYIVAKASVAALDAHGELAKH 60 

Query: 72 AVEETGRGVFEDKATKNLFACEHVVigNMRHTKTVGVIEEDDVTGLTLIAEPVGVVCGITP 131 
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A EETGRGVFEDKATK+LFACEHWNNMRH KTVG+IEEDDVTGLTLIAEPVGV+CGITP 
Sbjct: 61 AYEETGRGVFEDKATKHLFACEHVVNNMRHQKTVGIIEEDDVTGLTLIAEPVGVICGITP 120 

Query: 132 TTNPTSTAIFKSLISLKTRNPIIFAFHPSAQESSAHAARIWDAAIAAGAPENCVQWIEQ 191 
5 TTNPTSTAIFKSLISLKTRNPIIFAFHPSAQESSAHAARIVRDAAIAAGAPENCVQW+E 

Sbjct: 121 TTNPTSTAIFKSLISLKTRNPIIFAFHPSAQESSAHAARIVRDAAIAAGAPENCVQWVET 180 

Query: 192 PSIDATNALMIOTDGIATIIATGGNAMVKAAYSGGKPALGVGAGNVPAyVEKSANIRQAAH 251 
PS++ATNALMNHDGIATILATGGNAMVKAAYSCGKPALGVGAGNVPAYVEKSANIRQAAH 
10 Sbjct: 181 PSLFATNALMNHDGIATILATGGNAMVKAAYSCGKPALGVGAGNVPAYVEKSANIRQAAH 240 

Query: 252 D IVMSKS FDNGMVCASEQAVI I DKE I YKEFVEEFKSYHTYFWKKEKALLEEFCFGAKAN 311 

DIVMSKSFDNGMVCASEQAVI IDKEIY +FV EFKSYHTYFVNKKEKALLEEFCFGAKAN 
Sbjct: 241 D IVMSKS FDNGMVCASEQAVI I DKE I YDDFVAEFKSYHTYFVNKKEKALLEEFCFGAKAN 300 

15 

Query: 312 SKNCAGAKIiNPNIVGKSAVWIAEQAGFTVPEGTNILAAECTEVSEKEPLTREKLSPVIAV 371 

SKNCAGAKLNPNIVGK A WIAEQAGFTVPEGTNILAAEC EVSE EPLTREKLSPVIAV 
Sbjct: 301 SKNCAGAKLNPNIVGKPATWIAEQAGFTVPEGTNIIAAECKEVSENEPLTREKLSPVIAV 360 

20 Query: 372 IiKAESTEDGVEKARQMVEFNGLGHSAAIHTKDADLAREFGTRIRAIRVIWNSPSTFGGIG 431 

LK+ES EDGVEKARQMVEFNGLGHSAAIHT DA+LA+EFGTRIRAIRVIWNSPSTFGGIG 
Sbjct: 361 LKSESREDGVEKARQMVEFNGLGHSAAIHTADAELAKEFGTRIRAIRVIWNSPSTFGGIG 420 

Query: 432 DVYNAFLPSLTLGCGSYGRNSVGDNVSAINLLNIKKVGRRRNNMQWFKVPSKTYFERDSI 491 
25 DVYNAFLPSLTLGCGSYGRN+VGDNVSAINLMIKiOTGRRRlSrNMQWFKVPSKTyFERDSI 

Sbjct: 421 DVYNAFLPSLTLGCGSYGRNAVGDNVSAINLLNIKKVGRRRNNMQWFKVPSKTYFERDSI 480 

Query: 492 QYLQKCRDVERVMIVTDHAMVELGFLDRI IEQLDLRRNKWYQI FAEVEPDPDITTVMKG 551 
QYLQKCRDWRVMIOTDHAMWLGFLDRIIEQLDLRRNKWYQIFAEVEPDPDITTVMKG 
30 Sbjct: 481 QYLQKCRDWRVMIWDHAMVELGFLDRIIEQLDLRRNKVVYQIFAEVEPDPDITTVMKG 540 

Query: 552 TDLMRTFKPDTIIALGGGSPMDAAKVMWLFYEQPEVDFHDLVQKFMDIRKRAFKFPELGK 611 

T+LMRTFKPDTIIALGGGSPMDAAKvMWLFYEQPEVDFHDLVQKFMDIRKRAFKFPELGK 
Sbjct: 541 TELMRTFKPDTIIALGGGSPMDAAKVMWLFYEQPEVDFHDLVQKFMDIRKRAFKFPELGK 600 

35 

Query: 612 KTKFVAIPTTSGTGSEVTPFAVISDKANNRKYPIADYSLTPTVAIVDPALVMTVPGFIAA 671 

KTKFVAIPTTSGTGSEVTPFAVISDKANNRKYPIADYSLTPTVAIVDPALV+TVPGFIAA 
Sbjct: 601 KTKEVAIPTTSGTGSEVTPFAVISDKANNRKYPIADYSLTPTVAIVDPALVLTVPGFIAA 660 

40 Query: 672 DTGMDVLTHATEAWSQMANDYTDGLALQAIKIVFDYLERSVKDADFEAREKMHNASTMA 731 

DTGMDVLTHATEAYVSQMAND+TDGIiALQAIKIVFD LE+SVK ADFEAREKMHNASTMA 
Sbjct: 661 DTGMDVLTHATEAYVSQMANDFTDGLALQAIKIVFDNLEKSVKTADFEAREKMHNASTMA 720 

Query: 732 GMaFAmFLGISHSMAHKIGAQFHTVHGRTTOILLPWIRYNGTRPAKTATWPKYNYYRA 791 
45 G1«IAFANAFLGISHSMAHKIGAQFHTVHGRTNAILLPYVIRYNGTRPAKTATWPKYNYYRA 

Sbjct: 721 GMAFAmFLGISHSMAHKIGAQFHTVHGRTNAILLPYVIRYNGTRPAKTATWPKYNYYRA 780 

Query: 792 DEKYQDIAKLLGLPAATPEEAVESYAKAVYDLGTRLGIKMNFRDQGIDEKEWKEKSRELA 851 
DEKYQDIAKLLGLPA+TPEEAVESYAKAVYDLG R+GI+MNF+ QGIDE EWKE SRELA 
50 Sbjct: 781 DEKYQDIAKLLGLPASTPEEAVESYAKAVYDLGCRVGIQMNFKAQGIDENEWKEHSREIiA 840 

Query: 852 FLAYEDQCS PANPRLPMVDHMQE I IEDAYYGYEERPGRRK . 891 

+LAYEDQCSPANPRLPMVDHMQEI IEDAYYGY ERPGRRK 
Sbjct: 841 YIAYEDQCSPANPRLPMVDHMQEIIEDAYYGYAERPGRRK 880 

55 

A related GBS gene <SEQ ID 8533> and protein <SEQ ID 8534> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 10 
McG: Discrim Score: -4.68 
60 GvH: Signal Score (-7.5): -2.48 

Possible site: 21 
»> Seems to have no N-terminal signal sequence 
ALOM program count: 1 value: -2.66 threshold: 0.0 

INTEGRAL Likelihood = -2.66 Transmembrane 100 - 116 ( 99 - 117) 
65 PERIPHERAL Likelihood = 3.61 173 

modified ALOM score: 1.03 
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*** Reasoning Step: 3 



Final Results 

5 bacterial membrane Certainty=0. 2062 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

SEQ ID 8534 (GBS432) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
10 extract is shown in Figure 173 (lane 5; MW 66kDa). It was also expressed in E.coli as a His-fusion product. 
SDS-PAGE analysis of total cell extract is shown in Figure 77 (lane 7; MW 41kDa). 

GBS432-GST was purified as shown in Figure 223, lane 9. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

15 Example 274 

A DNA sequence (GBSx0299) was identified in S.agalactiae <SEQ ID 877> which encodes the amino acid 
sequence <SEQ ID 878>. Analysis of this protein sequence reveals the following: 



20 



25 



Possible site: 21 

>>> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 3444 (Affirmative) < suco 

bacterial membrane — Certainty=0.0000 (Not Clear) < suco 

bacterial outside Certainty=0.0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database, but there is 
homology to SEQ ID 880. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

30 Example 275 

A DNA sequence (GBSx0300) was identified in S.agalactiae <SEQ ID 881> which encodes the amino acid 
sequence <SEQ ID 882>. Analysis of this protein sequence reveals the following: 

Possible site: 26 

>>> Seems to have a cleavable N-term signal seq. 

35 INTEGRAL Likelihood = -8.39 Transmembrane 74 - 90 ( 69 - 94) 

INTEGRAL Likelihood = -5.31 Transmembrane 168 - 184 ( 163 - 186) 

INTEGRAL Likelihood = -4.83 Transmembrane 34 - 50 ( 29 - 52) 

INTEGRAL Likelihood = -0.75 Transmembrane 202 - 218 ( 202 - 219) 

40 Final Results 

bacterial membrane Certainty=0. 4354 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty^O . 0000 (Not Clear) < suco 

45 The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAA17305 GB:AL021926 hypothetical protein RvOlll [Mycobacterium 
tuberculosis] 

Identities = 70/218 (32%) , Positives = 104/218 (47%) , Gaps = 12/218 (5%) 

50 Query: 9 VRITGLLLVLLYHFFKNSFPGGFVGVDIFFTFSGFLITALLIDEFSKTKKIDFVSFCRRR 68 

+R + LVL H GGF+GVD FF SGFLIT+LL+DE +T +ID F RR 
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Sbjct: 39 LRAIAVALVLASHGGIPGMGGGFIGVDAFFVLSGFLITSLLLDELGRTGRIDLSGFWIRR 98 

Query: 69 FYRIFPPLVLMVLVTIPFVFLVKSDFRRSIGSQIMTALGFTSNFYEILTGGNYESQFI-P 127 

R+ P LVLMVL L + S + A +T+N+ + +Y +Q P 

Sbjct: 99 JU^LLPALVLMVLTVSAARALFPDQALTGLRSDAIAWLOTANWRFVAQIWDYFTQGAPP 158 

Query: 128 HLFVHTWSLSIEVHFYVLWGL TVWLLSKRSKDQKQLRGTLFLISMGIFGVSFLTMF 183 

HTWSL +E +YV+W L LL+ R++ ++ R T+ + F ++ L 

Sbjct: 159 SPLQHTWSLGVEEQYYVWPLLLIGATLLLAARaR-RRCRRATVGGVRFAAFLIASLGTM 217 

Query: 184 VRAFFVDNFST IYFSTDSHI FPFFLGAMVATI 215 

A F++ IYF T + +G+ A + 

Sbjct: 218 ASATAAVAFTSAATRDRIYFGTDTRAQALLIGSAAAAL 255 

A related DNA sequence was identified in S.pyogenes <SEQ ID 879> which encodes the amino acid 
sequence <SEQ ID 880>. Analysis of this protein sequence reveals the following: 

Possible site: 46 

>>> seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-10.83 Transmembrane 325 - 341 ( 313 - 346) 

INTEGRAL Likelihood = -9.29 Transmembrane 237 - 253 ( 234 - 258) 

INTEGRAL Likelihood = -7.91 Transmembrane 166 - 182 ( 162 - 188) 

INTEGRAL Likelihood = -6.10 Transmembrane 72 - 88 ( 68 - 92) 

INTEGRAL Likelihood = -4.09 Transmembrane 264 - 280 ( 260 - 281) 

INTEGRAL Likelihood = -2.87 Transmembrane 371 - 387 ( 370 - 390) 

INTEGRAL Likelihood = -2.66 Transmembrane 34 - 50 ( 32 - 50) 

INTEGRAL Likelihood = -1.91 Transmembrane 3 - 19 ( 3-19) 

INTEGRAL Likelihood = -0.85 Transmembrane 136 - 152 ( 136 - 154) 



bacterial membrane — - Certainty=0. 5331 (Affirmative) < suco 
bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 
bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 167/226 (73%) , Positives = 195/226 (85%) 

Query: 1 MRIKWFSLVRITGLLLVLLYHFFKNSFPGGFVGVDIFFTFSGFLITALLIDEFSKTKKID 60 

MRIKWFS VR+TGLLLVLLYHFFKN FPGGF+GVDIFFTFSG+LITALLIDE++K + ID 
Sbjct: 1 MRIKWFSFVRVTGLLLVLLYHFFKNVFPGGFIGVDIFFTFSGYLITALLIDEYTKKESID 60 

Query: 61 FVSFCRRRFYRIFPPLVLMVLVTIPFVFLVKSDFRASIGSQIMTALGFTSNFYEILTGGN 120 

+ F +RRFYRI PPLVLM+L+TIPF FL+K DF A+IGSQI LGFT+N YEILTG + 
Sbjct: 61 IIGFLKRRFYRIVPPLVLMILLTIPFTFLIKKDFIANIGSQITAVLGFTTNIYEILTGSS 120 

Query: 121 YESQFIPHLFVHTWSLSIEVHFYVLWGLTVWLLSKRSKDQKQLRGTLFLISMGIFGVSFL 180 

YESQFI PHLFVHTWSL+ IEVHFY+ WG+ VWLL++R + QKQLRG LFLIS+GIF +SFL 
Sbjct: 121 YESQFIPHLFTOTWSIAIEVHFYLFWGVEWLLARRKETQKQLRGLLFLISLGIFAISFL 180 

Query: 181 TMFVRAFFVDNFSTIYFSTLSHIFPFFLGAMVATISGIREITGRFK 226 

+MF+R+F NFS IYFS+LSH FPFFLGAM ATI+GI E T RF-f 
Sbjct: 181 SMFIRSFMTSNFSLIYFSSLSHSFPFFLGAMFATITGINETTVRFQ 226 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 276 

A DNA sequence (GBSx0302) was identified in S.agalactiae <SEQ ID 883> which encodes the amino acid 
sequence <SEQ ID 884>. Analysis of this protein sequence reveals the following: 

Possible site: 38 

>» Seems to have a cleavable N-term signal seq. 
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Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

!GB:AE004818 hypothetical protein [Pseudomonas aerug. . 

!GB:AE004818 hypothetical protein [Pseudomonas aerug. . 

!GB:AE004818 hypothetical protein [Pseudomonas aerug.. 

!GB:AE004818 hypothetical protein [Pseudomonas aerug.. 

!GB:AE004818 hypothetical protein [Pseudomonas aerug.. 

>GP:AAG07403 GB:AE004818 hypothetical protein [Pseudomonas aeruginosa] 
Identities = 33/80 (41%) , Positives = 50/80 (62%) 

Query: 45 KYVGS I VNHHMTGKGKLTYENGDYYKGD FVNGVFEGKGT FVSVHGWS YTGDFKKGQPDGQ 104 

+Y G +V+ + G+G+L Y+NG +Y G F +G+ G GT+ G Y+G F G DGQ 
Sbjct: 39 RYRGELVDGRLEGQGRLDYDNGAWYAGRFEHGLLHGHGTWQGADGSRYSGGFAAGLFDGQ 98 

Query: 105 GRLNAKNKKVYKGTFKQGIY 124 

GRL + VY+G F+QG++ 
Sbjct: 99 GRLAMADGSVYQGGFRQGLF 118 
Identities = 31/91 (34%) , Positives = 46/91 (50%) , Gaps = 2/91 (2%) 

Query: 34 QGVFS YDGGKI KYVGS I VNHHMTGKGKLTYENGDYYKGDFVNGVFEGKGTFVSVHGWSYT 93 

QG YD G YG + +GG +GYGF G+F+G+G G Y 

Sbjct: 52 QGRLDYDNGAW-YAGRFEHGLLHGHGTWQGADGSRYSGGFAAGLFDGQGRLAMADGSVYQ 110 

Query: 94 GDFKKGQPDGQGRLNAKNKKVYKGTFKQGIY 124 

G F++G DG+G L + + Y+G F++G+Y 
Sbjct: 111 GGFRQGLFDGEGSLEQQGTR-YRGGFRKGLY 140 
Identities = 31/91 (34%) , Positives = 42/91 (46%) , Gaps = 1/91 (1%) 

Query: 32 S SQGVFSYDGGKI KYVGS I VNHHMTGKGKLTYENGDYYKGDFVNGVFEGKGTFVSVHGWS 91 

S QG G +Y GS + G+G + G+ Y G F +G GKG + G 

Sbjct: 141 SGQGTLDGSDGS-RYCGSFRQGRLEGEGSFSDSQGNQYAGTFRDGQLNGKGRWSGPDGDR 199 

Query: 92 YTGDFKKGQPDGQGRLNAKNKKVYKGTFKQG 122 

Y G FK Q GQGR + + V+ G F +G 
Sbjct: 200 YVGQFKDNQFHGQGRYESASGDVWIGRFSEG 230 
Identities = 31/91 (34%) , Positives = 45/91 (49%) , Gaps = 4/91 (4%) 

Query: 34 QGVFSYDGGK 1 KYVGS I VNHHMTGKGKLTYENGDYYKGDFVNGVFEGKGTFVSVHG 89 

QG+F +G +Y G +G+G L +G Y+G F G EG+G+F G 

Sbj ct : 115 QGLFDGEGSLEQCGTRYRGGFRKGLYSGQGTLDGSDGSRYQGSFRQGRLEGEGSFSDSQG 174 

Query: 90 WSYTGDFKKGQPDGQGRLNAKNKKVYKGTFK 120 

Y G F+ GQ +G+GR + + Y G FK 
Sbjct: 175 NQYAGTFRDGQLNGKGRWSGPDGDRYVGQFK 205 
Identities = 28/87 (32%) , Positives = 45/87 (51%) , Gaps = 1/87 (1%) 

Query: 34 QGVFSYDGGKIKYVGSIVNHHMTGKGKLTYENGDYYKGDFVNGVFEGKGTFVSVHGWSYT 93 

+G FS G +Y G+ + + GKG+ + +GD Y G F + F G+G + S G + 
Sbjct: 166 EGSFSDSQGN-QYAGTFRDGQLNGKGRWSGPDGDRYVGQFKDNQFHGQGRYESASGDVWI 224 

Query: 94 GDFKKGQPDGQGRLNAKNKKVYKGTFK 120 

G F +G +G G L + Y+G F+ 
Sbjct: 225 GRFSEGALNGPGELLGADGSRYRGGFQ 251 
Identities = 28/89 (31%) , Positives = 43/89 (47%) , Gaps = 2/89 (2%) 

Query: 34 QGVFSYDGGKI KYVGS I VNHHMTGKGKLTYENGDYYKGDFVNGVFEGKGTFVSVHGWSYT 93 

QG + G + Y G G+G L + G Y+G F G++ G+GT G Y 

Sbjct: 98 QGRI^ADGSV-YQGGFRQGIjFDGEGSLE-QQGTRYRGGFRKGLYSGQGTLDGSDGSRYQ 155 

Query: 94 GDFKKGQPDGQGRLNAKNKKVYKGTFKQG 122 

G F++G+ +G+G + Y GTF+ G 

Sbjct: 156 GSFRQGRLEGEGSFSDSQGNQYAGTFRDG 184 
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Identities = 25/80 (31%) , Positives = 37/80 (46%) 

Query: 45 KYVGSIVNHHMTGKGKLTYENGDYYKGDFVNGVFEGKGTFVSVHGWSYTGDFKKGQPDGQ 104 

+YVG ++ G+G+ +GD + GFG GG + GYGF+ + GQ 
Sbjct: 199 RYVGQFKDNQFHGQGRYESASGDVWIGRFSEGAIiNGPGELLGADGSRYRGGFQFWRFHGQ 258 

Query: 105 GRLNAKNKKVYKGTFKQGIY 124 

G L + Y+G F G Y 
Sbjct: 259 GLLEQLDGTRYEGGFAAGAY 278 

A related DNA sequence was identified in S.pyogenes <SEQ ID 885> which encodes the amino acid 
sequence <SEQ ID 886>. Analysis of this protein sequence reveals the following: 

Possible site: 35 

15 »> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-13.16 Transmembrane 20 - 36 ( 12 - 41) 

Final Results 

bacterial membrane Certainty=0 . 6265 (Affirmative) < suco 

20 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:BAA16606 GB:D90899 hypothetical protein [Synechocystis sp.] 
25 Identities = 37/89 (41%) , Positives = 49/89 (54%) , Gaps = 6/89 (6%) 

Query: 48 KGRMHYT GYVINHKMNGEGKLVYPNGDIYEGTFKDGLFEGKGTFTAKTGWLYNG 101 

KG YT G V+ ++NG GK Y NGD YEGT K+G +G+G F G Y G 

Sbjct: 141 KGTFIYTNGDRCSGTWCGEmGSGKCEYNNGDQYEGTLKNGQPDGEGIFRFAAGGEYEG 200 

30 

Query: 102 EFHKGQANGKGVLKAKNNKVYKGI FKQGI 13 0 

EF G+ +G+G N ++G FKQG+ 
Sbjct: 201 EFQSGEFSGQGTRI FANGNRFQGQFKQGL 229 

35 An alignment of the GAS and GBS proteins is shown below: 

Identities = 68/126 (53%) , Positives = 93/126 (72%) 

Query: 1 MKNFKI TRTHLE I LSLI 1 1 WFGLS VFTLTTSSQGVFSYDGGKI KYVGS I VNHHMTGKGK 60 
+K + ITR LEI+S+I+I+V +SVF++ S++ +YD G++ Y G ++NH M G+GK 
40 Sbjct: 8 VKKWS ITRAKLE I VSVI VI LVCAI SVFS VRI SNKTS LTYDKGRMHYTGYVINHKMNGEGK 67 

Query: 61 LTYENGDYYKGDFVNGVFEGKGTFVSVHGWSYTGDFKKGQPDGQGRLNAKNKKVYKGTFK 120 

L Y NGD Y+G F +G+FEGKGTF + GW Y G+F KGQ +G+G L AKN KVYKG FK 
Sbjct: 68 LVYPNGDIYEGTFKDGLFEGKGTFTAKTGWLYNGEFHKGQANGKGVLKAKNNKVYKGIFK 127 

45 

Query: 121 QGIYQK 126 

QGI+QK 
Sbjct: 128 QGIFQK 133 

50 SEQ ID 884 (GBS 139) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 19 (lane 3; MW 13kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 22 (lane 2; MW 38.2kDa), in Figure 24 
(lane 7; MW 38kDa) and in Figure 33 (lane 7; MW 38.2kDa). 

The GBS139-GST fusion product was purified (Figure 200, lane 2) and used to immunise mice. The 
55 resulting antiserum was used for FACS (Figure 287), which confirmed that the protein is immunoaccessible 
on GBS bacteria. 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 277 

A DNA sequence (GBSx0303) was identified in S.agalactiae <SEQ ID 887> which encodes the amino acid 
5 sequence <SEQ ID 888>. This protein is predicted to be holliday junction dna helicase ruvb (ravB). 
Analysis of this protein sequence reveals the following: 

Possible site: 59 

»> Seems to have no N-terminal signal sequence 

10 Final Results 

bacterial cytoplasm Certainty=0.438S (Affirmative) < suco 

bacterial membrane Certainty=0 .0000 (Not Clear) < suco 
bacterial outside Certainty=0. 0000 (Not Clear) < suco 

15 The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB75331 GB.-Y15896 RuvB protein [Bacillus subtilis] 
Identities = 196/322 (60%) , Positives = 254/322 (78%) 

Query: 3 RFLDSDAMGDEELVERTLRPQYLREYIGQDKVKDQLKIFIEAAKLRDESLDHVLLFGPPG 62 
20 R + S+A E ++E++LRPQ L +YIGQ KVK+ L++FI+AAK+R E+LDHVLL+GPPG 

Sbjct: 4 RLVSSEADNHESVIEQSLRPQNIAQYIGQHKWCENLRVFIDAAKMRQETLDHVLLYGPPG 63 

Query: 63 LGKTTMAFVIANELGVNLKQ/TSGPAOCEKSGDLvAI^ 122 
LGKTT+A ++ANE+GV L+ TSGPAIE+ GDL AIL LEPGDVLFIDEIHR+ ++EEV 
25 Sbjct: 64 LGKTTL&S I VANEMGVELRTTSGPAIERPGDIiA&IIjTALEPGDVIiFIDE I HRJuHRS IEEV 123 

Query: 123 LYSAMEDFYIDIMIGAGETSRSVHLDLPPFTLIGATTRAGMLSNPIiRARFGITGHMEYYE 182 

LY AMEDF +DI4-IG G 4-+RSV LDLPPFTL+GATTR G+L+ PLR RFG+ +EYY 
Sbjct: 124 LYPAMEDFCLDIVIGKGPSARSTOLDIaPPFTLViaTTOVGl,LTAPLRDRFGVMSRr.EYYT 183 

30 

Query: 183 ENDLTEI IERTADI FEMKITYEAASEIARRSRGTPRIANRLLKRVRDYAQIMGDGIjIDDN 242 

+ + l + i+ RTAD+FE++I +A E+ARRSRGTPR+ANRLL+RVRD+AQ++GD I ++ 
Sbjct: 184 QEEIADIOTRTADVFEVEIDKPSALEIARRSRGTPRVANRLLRRVRDFAQVLGDSRITED 243 

35 Query: 243 ITDKALTMLDVDHEGLDYVDQKILRTMIE^IYNGGPVGLGTLSVNIAEERDTVEDMYEPYL 302 

1+ AL L VD GLD++D K+L MIE +NGGPVGL T+S 1 EE T+ED+YEPYL 
Sbjct: 244 ISQNALERLQVDRLGLDHIDHKLLMGMIEKFNGGPVGLDTISATIGEESHTIEDVYEPYL 303 

Query: 303 IQKGFIMRTRTGRVATVKAYEH 324 
40 +Q GFI RT GR+ T Y H 

Sbjct: 304 LQIGFIQRTPRGRIVTPAVYHH 325 

A related GBS nucleic acid sequence <SEQ ID 10943> which encodes amino acid sequence <SEQ ID 
10944> was also identified. 

45 A related DNA sequence was identified in S.pyogenes <SEQ ID 889> which encodes the amino acid 
sequence <SEQ ID 890>. Analysis of this protein sequence reveals the following: 

Possible site: 26 

»> Seems to have no N-terminal signal sequence 

50 Final Results 

bacterial cytoplasm Certainty=0. 0686 (Affirmative) < suco 

bacterial membrane Certainty^O . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0.0000 (Not Clear) < suco 

55 An alignment of the GAS and GBS proteins is shown below: 

Identities = 282/327 (86%) , Positives = 306/327 (93%) 
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Query: 1 MTRFLDSDAMGDEELVERTLRPQYLREYIGQDKVKDQLKIFIEAAKLRDESLDHVLLFGP 60 

M R LD++ MG+EE +RTLRPQYL EYIGQDKVK+Q IFIEAAK RDESLDHVLLFGP 
Sbjct: 25 MARILDNNVMGNEEFSDRTLRPQYLHEYIGQDKVKEQFAIFIEAAKRRDESLDHVLLFGP 84 

5 Query: 61 PGLGKTTI^FVIANELGVNLKQTSGPAIEKSGDLVAILNDLEPGDVLFIDEIHRMPMAVE 120 

PGLGKTTMAFVIANELGVNLKQTSGPA+EK+GDLVAIIiN+LEPGD+LFIDEIHRMPM+VE 
Sbjct: 85 PGLGKTTMAFVIANELGVNLKQTSGPAVEKAGDLVAIIJNELEPGDILFIDEIHRMPMSVE 144 

Query: 121 EVLYSAMEDFYIDIMIGAGETSRSVHLDLPPFTLIGATTRAGMLSNPLRARFGITGHMEY 180 
10 EVLYSAMEDFYIDIMIGAG+TSRS+HLDLPPFTLIGATTRAGMLSNPLRARFGITGHMEY 

Sbjct: 145 EVLYSAMEDFYIDIMIGAGDTSRSIHLDLPPFTLIGATTRAGMLSNPLRARFGITGHMEY 204 

Query: 181 YEENDLTEIIERTADIFEMKITYEAASELARRSRGTPRIANRLLKRVRDYAQIMGDGLID 240 
Y+E DLTEI+ERTA 1FE+KI +EAA +LA RSRGTPRI ANRLLKRVRDYAQI +GDG+ 1 
15 Sbjct: 205 YQEKDLTEIVERTATIFEIKIDHEAARKLACRSRGTPRIANRLLKRVRDYAQIIGDGIIT 264 

Query: 241 DNITDKALTMLDVDHEGLDYVDQKILRTMIEMYNGGPVGLGTLSVNIAEERDTVEDMYEP 300 

ITD+ALTMLDVD EGLDY+DQKILRTMIEMY GGPVGLGTLSVNIAEER+TVE+MYEP 
Sbjct: 265 AQITDRALTMLDVDREGLDYIDQKILRTMIE^QGGPVGLGTLSVNIAEERNTVEEMYEP 324 



20 



45 



50 



Query: 301 YLIQKGFIMRTRTGRVATVKAYEHLGY 327 

YLIQKGF+MRTRTGRVAT KAY HLGY 
Sbjct: 325 YLIQKGFLMRTRTGRVATQKAYRHLGY 351 



25 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 278 

A DNA sequence (GBSx0304) was identified in S.agalactiae <SEQ ID 891> which encodes the amino acid 

sequence <SEQ ID 892>. Analysis of this protein sequence reveals the following: 

30 Possible site: 43 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.87 Transmembrane 157 - 173 ( 157 - 174) 
INTEGRAL Likelihood = -1.49 Transmembrane 205 - 221 ( 205 - 222) 

35 Final Results 

bacterial membrane Certainty=0 .2147 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

40 The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 893> which encodes the amino acid 
sequence <SEQ ID 894>. Analysis of this protein sequence reveals the following: 



Possible site: 56 

»> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 3097 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 130/303 (42%) , Positives = 202/303 (65%) 

Query: 1 MLKHFGSKVRNLRVTRNITREDFCGDETELSTOQIiARIESGQSIPNLTKAHYIAKQLNVK 60 
55 ML+HFG KV+ LR+ + I+RED CGDE+ELSVRQLARIE GQSIP+L+K +IAK LNV 

Sbjct: 1 MLEHFGGKVTWLRLEKRISREDLCGDESELSWQLARIEIjGQSIPSLSKVIFIAKALNVS 60 



Query: 61 LDILTGGESLELPKRYKELKYLILRIPTYADAERLKLRECQFDHIFEEFYDNLPEDECLA 120 
+ LT G LELPKRYKELKYLILR PTY D +L++RE QFD IFE++YD LPE+E + 
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Sbjct: 61 VGYLTDGADLELPKRYKELKYLILRTPTYMDDGKLQVREEQFDEIFEDYYDKLPEEEKII 120 

Query: 121 IDSLQAKFEVYQTGDINFGVEVLCECFDKVKYKEKYTl^LIIIDLFrjTCAVVSKFNNRA 180 

ID LQA + + + NFG+++L E F+++K K ++ NDLI+++L+L + + + 
Sbjct: 121 IDCLOATLDTLLSENTNFGIDLLQEYFNQIKTKVRFRQNDLILLELYIAYLDIEGMDGQY 180 

Query: 181 FTKEVFQTI CKTLI SQISfflKLTAEDLFWFNHVLLNCVFVGLCLNSEECLAEMLEVSRQTMV 240 

K + ++ L Q + ++LF N ++++ + L N + L + +E+S++ M 
Sbjct: 181 SDKIFYDSLLDNLSEQFEQFELDELFIVNKIIIDISSLSLKNNRLDNLEKAIEMSQKIMA. 240 

Query: 241 STHDFHKMPLYFMYQWKYFITIDNDIKSAENAYQQSIMFSKMIDDKHLIKKLELEWQEDI 300 

D+++MP+ + +WKYF+ DI AE ++ ++ +F++M D++L KL EW++D+ 
Sbjct: 241 KIQDWHRMPILKLIEWKYFLIKQKDIIKAEQSFMKACLFAQMTADQYLENKLIQEWEKDV 300 

Query: 301 TGH 303 
+ 

Sbjct: 301 KSY 303 

SEQ ID 892 (GBS319) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 40 (lane 4; MW 37kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 46 (lane 7; MW 62kDa). 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 279 

A DNA sequence (GBSx0305) was identified in S.agalactiae <SEQ ID 895> which encodes the amino acid 
sequence <SEQ ID 896>. This protein is predicted to be adenylosuccinate lyase (purB). Analysis of this 
protein sequence reveals the following: 

Possible site: 35 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3358 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB04344 GB:AP001509 adenylosuccinate lyase [Bacillus halodurans] 
Identities = 326/430 (75%) , Positives = 366/430 (84%) 



Query: 


1 


MIERYSRPEMAAIWTEENKYRAWLEVEILADEAWAELGEIPKEDVAKIREKADFDIDRIL 


60 






MIERY+RPEM AI WTEEN+ Y+AWLEVE I +A EAWAELGE I PKEDV KIRE A FD++RIL 




Sb j ct : 


1 


MIERYTRPEMGAIWTEENRYQAWLEVEIVACEAWAELGEIPKEDVKKIREHASFDVERIL 


60 


Query: 


61 


EIEQDTRHDWAFTRAVSETLGEERKWVHYGLTSTDVVDTAYGYLYKQANDIIRRDLENF 


120 






EIEQ+TRHDWAFTRAVSETLGEERKWVHYGLTSTDWDTA YL KQAN+II DL F 




Sb j ct : 


61 


EIEQETRHDWAFTRAVSETLGEERKWVHYGLTSTDVVDTALSYLLKQANEIIEADLVRF 


120 


Query: 


121 


IWIVADKAKEHKFTI^GRTHGVHAEPTTFGLKLATWYSEMKRNIERFEHAAAGVEAGKI 


180 






+1+ +KA EHK+T+MMGRTHGVHAEPTTFGLKLA WY EMKRN+ERF AA GV GK+ 




Sb j ct : 


121 


LDILKEKALEHKYTVMMGRTHGVHAEPTTFGLKLALWYEEMKRNLERFRIiAAEGvRVGKL 


180 


Query: 


181 


SGAVGNFANIPPFVEQYVCDKLGIRPQEISTQVLPRDLHAEYFAVLASIATSIERMATEI 


240 






SGAVG +ANI PFVEQYVC+KLG+ ISTQ L RD HAEY A LA IATSIE+ A EI 




Sb j ct : 


181 


SGAVGTYANIDPFvEQWCEKLGLERAPISTQTLQRDRHAEYMATLALIATSIEKFAVEI 


240 


Query: 


241 


RGLQKSEQREVEEFFAKGQKGSSAMPHKRNPIGSENMTOIjARVIRGHMVTAYENVALWHE 


300 






RGLQKSE REVEE+FAKGQKGSSAMPHKRNPIGSENMTG+ARV+RGHM+ AYENV LWHE 




Sbjct: 


241 


RGLQKSETREVEEYFAKGQKGSSAMPHKRNPIGSE^OTIGIARVVRGHMLAAYENVPLWHE 


300 
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Query: 301 RDISHSSAERIITPDTTILIDYMl^FGNIVKl&TVFPENMMRNMESTFGLIYSQRVMLK 360 

RDISHSSAERII PD TI I+YMLNRFGNIVKNLTVFPENM RNM T+GLIYSQRV+L 
Sbjct: 301 RDISHSSAERIILPDATIAINYMI^FGNIVKMLTVFPENMKRNMTRTYGLIYSQRVLLS 360 

5 Query: 361 LIEKGMTREEAYDLVQPKTAYSWDNQVDFKPLLEEDTKVTSCLTQEEIDELFNPIYYTKR 420 

LI+KGM REEAYDLVQPK +W+ V F+ L+E++ ++TS L+ EEI+ F+ ++ K 
Sbjct: 361 LIDKGMVREEAYDLVQPKAMEAWEKGVQFRELVEQEERITSVLSPEEIEACFDYNHHLKH 420 

Query: 421 VDDIFERLGL 430 
10 VD IFERLGL 

Sbjct: 421 VDTIFERLGL 430 

A related DNA sequence was identified in S.pyogenes <SEQ ID 897> which encodes the amino acid 
sequence <SEQ ID 898>. Analysis of this protein sequence reveals the following: 

15 Possible site: 35 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3358 (Affirmative) < suco 

20 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 422/430 (98%) , Positives = 428/430 (99%) 

Query: 1 MIERYSRPEMAAIWTEENKYRAWLEVEILftDEAWAELGEIPKEDVAKIREKADFDIDRIL 60 

M+ERYSRPEMAAIWTEENKY AWLEVEILADEAWAELGEIPKEDVAKIREKADFDIDRIL 
Sbjct: 1 t^ERYSRPEMAAIWTEENKYHAWLEVEILADEAWAELGEIPKEDVAKIREKADFDIDRIL 60 

30 Query: 61 EIEQDTRHDWAFTRAVSETLGEERKWVHYGLTSTDVVDTAYGYLYKQANDIIRRDLENF 12 0 

EIEQDTRHDWAFTRAVSETLGEERKWVHYGLTSTDVVKTAYGYLYKiQftNDIIRRDLENF 
Sbjct: 61 EIEQDTRHDWAFTRAVSETIX3EERKWVHYGLTSTDVVDTAYGYriYKQANDIlRRDLENF 120 

Query: 121 TNIVADKAKEHKFTIMMGRTHGVHAEPTTFGLKLATWYSEMKRNIERFEHAAAGVEAGKI 180 
35 TNI VADKA+EHK TIMMGRTHGVHAEPTTFGLKLATWYSEMKRNIERFEHAAAGVEAGKI 

Sbjct: 121 TNIVADKAREHKMTIMMGRTHGVHAEPTTFGLKLATWYSEMKRNIERFEHAAAGVEAGKI 180 

Query: 181 SGAVGNFANI PPFVEQYVCDKLGIRPQE I STQVLPRDLHAEYFAVLAS I ATS I ERMATE I 240 
SGAVGNFANIPPFVE+YVCDKLGIRPQEISTQVLPRDLHAEYFAVLASIATSIERMATEI 
40 Sbjct: 181 SGAVGNFANI PPFVEEYVCDKLGIRPQE I STQVLPRDLHAEYFAVLAS I ATS I ERMATE I 240 

Query: 241 RGLQKSEQRETOEFFAKGQKGSSAMPHK™PIGSFJ^IMTGLARVIRGH^WTAYENVALWHE 300 

RGLQKSEQREVEEFFAKGQKGSSAMPHKRNPIGSENMTGLARVIRGHMVTAYENV+LWHE 
Sbjct: 241 RGLQKSEQREVEEFFAKGQKGSSAMPHKRNPIGSENMTGLARVIRGHMVTAYENVSLWHE 300 

45 

Query: 301 RDISHSSAERIITPDTTILIDYMLNRFGNIVKNLOTFPENMMRNMESTFGLIYSQRVMLK 360 

RDISHSSAERIITPDTTILIDYMtNRFGNIVKNLTVFPENMMRNMESTFGLIYSQRVMLK 
Sbjct: 301 RDISHSSAERIITPDTTILIDYMLNRFGNIVKNLTVFPENMMRNMESTFGIiIYSQRVMLK 360 

50 Query: 361 LIEKGMTREEAYDLVQPKTAYSWDNQVDFKPLLEEDTKVTSCLTQEEIDELFNPIYYTKR 420 

LIEKGMTREEAYDLVQPKTAYSWDNQVDFKPLLEEDTKVTSCLTQEEIDELFNPIYYTKR 
Sbjct: 361 LIEKGMTREEAYDLVQPKTAYSWDNQVDFKPLLEEDTKVTSCLTQEEIDELFNPIYYTKR 420 



25 



Query: 421 VDDIFERLGL 430 
55 VDDIF+RLG+ 

Sbjct: 421 VDDIFKRLGI 430 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 280 

A DNA sequence (GBSx0306) was identified in S.agalactiae <SEQ ID 899> which encodes the amino acid 
sequence <SEQ ID 900>. Analysis of this protein sequence reveals the following: 

Possible site: 45 

»> Seems to have no N-tertninal signal sequence 



INTEGRAL 


Likelihood 




16, 


.24 


Transmembrane 


145 - 


161 


( 


119 


- 167} 


INTEGRAL 


Likelihood 




-9 


.98 


Transmembrane 


125 - 


141 


( 


119 


- 144) 


INTEGRAL 


Likelihood 




-9.29 


Transmembrane 


28 - 


44 


( 


23 


- 51) 


INTEGRAL 


Likelihood 




-7 


.01 


Transmembrane 


196 - 


212 


( 


193 


- 220) 


INTEGRAL 


Likelihood 




-6 


.21 


Transmembrane 


96 - 


112 


( 


88 


- 116) 


INTEGRAL 


Likelihood 




-5 


.79 


Transmembrane 


249 - 


265 


( 


246 


- 266) 


INTEGRAL 


Likelihood 




-2 


.87 


Transmembrane 


222 - 


238 


( 


222 


- 238) 


INTEGRAL 


Likelihood 




-2.28 


Transmembrane 


279 - 


295 


( 


278 


- 295) 



Final Results 

bacterial membrane -• 

bacterial outside -• 

bacterial cytoplasm -■ 



- Certainty=0. 7496 (Affirmative) < suco 

- Certainty=0. 0000 (Not Clear) < suco 

- Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB13498 GB:AB028634 RNA polymerase [Flammulina velutipes] 
Identities = 83/336 (24%) , Positives = 150/336 (43%) , Gaps = 40/336 (11%) 



Query: 


152 


I LLLI AFVS IGKNR - VYNFVQNIJsfYFEEVIWNYPEENPVKI KEKSLI I K FLLTIS 


205 






III L SI NR + ++ N ++ N+F+ + +K K L+I F++ +S 




Sbjct: 


133 


ILFLYLIYS1LINRFILKWLDNSGIIYKININWFKNHMIIOJINKMLVIN1KFFNFIIKLS 


192 


Query: 


206 


FVFVIDFAMVRL IMFNIKFSTIIACSAILIAWLYQN KSVTEPFL 


249 






+ +1 +++ L +NF+I4- I I ++ S+ F 




Sb j ct : 


193 


I ITI I GI SIMELFGIFGIHFDIRIII INYLKTINSGKIHLTI INMDQYS VLENS IHT I FY 


252 


Query: 


250 


LKKLVIYFIFFIATLIGNLKN-ELSILETPLLFISIFFTMDRI IALSKEMRDLI- -ISKS 


306 






+ L+I+ IF L N+KN + +1 +L+I IF I ++DL+ ++K 




Sb j Ct : 


253 


INLLIIFLIFISLILYRNVKNIDTNIKRWIILYILIFLINIIFIFNHIYIKDL^1DNLNKY 


312 


Query: 


307 


ILFYYDHENIKPSILLSEIKEIKYLENVDIGE- - -LELVRQMVIRLRLELEEEFLILSDI 


363 






IL Y D I S+ L ++K L+ ++I + V+ + 1+ ++E L + I 




Sb j Ct : 


313 


ILDYMDLHIIWSLFLFNKFDVK-LKRIN1YKSYSTVTVKDLEIKSKIEERSNELDIKLI 


371 


Query: 


364 


YMKNG-YEKYIQFVQGNVYFINLE--LDKIPNYTNLKLILESIFD HNNQKIFIPKL 


416 






K G YE YI +4- N+ ++ E L P Y N +E + + + F+ K+ 




Sb j ct : 


372 


IAKYGSYENYINSIE-NINIVDEEFILKNYPEYINDSKFIEFLMELEPLFRDHTEFVKKI 


430 


Query: 


417 


YEEYIYILISLGEVEKAKEIL- - -KEVSDYLTEESL 449 








YE L + K+IL KE+ DY+ + +L 




Sbjct: 


431 


YENLNSTNEKLEFLLANKDILSENKEIFDYVLQLNL 466 





No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 281 

A DNA sequence (GBSx0308) was identified in S.agalactiae <SEQ ID 901> which encodes the amino acid 
sequence <SEQ ID 902>. Analysis of this protein sequence reveals the following: 

Possible site: 37 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial -cytoplasm Certainty=0 .3307 (Affirmative) < suco 
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bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

5 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 282 

A DNA sequence (GBSx0309) was identified in S.agalactiae <SEQ ID 903> which encodes the amino acid 
10 sequence <SEQ ID 904>. This protein is predicted to be purK (purK). Analysis of this protein sequence 
reveals the following: 

Possible site: 34 

>>> Seems to have no N-terminal signal sequence 

15 Final Results 

bacterial cytoplasm Certainty=0 . 0334 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

20 A related GBS nucleic acid sequence <SEQ ID 9461> which encodes amino acid sequence <SEQ ID 9462> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 



25 



>GP:CAA04376 GB:AJ000883 purK [Lactococcus laotis] 
Identities = 208/347 (59%) , Positives = 258/347 (73%) , Gaps = 3/347 (0%) 

Query: 14 NSFKTIGIIGGGQLGQMMAIAAIYMGHKVITLDPASDCPASRVS-EVIVAPYDDVEALGT 72 

N+ +TIGI IGGGQLGQMMAIAA YMGHKVITLDP +C A++VS E+IVAPYDDVE L 
Sbjct: 4 NTKQTIGIIGGGQLGQMMAIAAQYMGHKVITLDPNPNCSAAKVSDEDIVAPYDDVENLLR 63 

30 Query: 73 LAARCDVLTYEFENvDADGLDAWSAGQLPQGTDLLRISQNRIFEKI)FLANKAGVTVAPY 132 

LA CDV+TYEFENV A L + ++PQG LL I+QNR FEK+FL N+A V VAP+ 
Sbjct: 54 IAYACDVITYEFENVSAKALHEIEGCVRIPQGIRLLEITQNRRFEKEFLTNEAKVNVAPW 123 

Query: 133 KVVTSSLDLEGLDLTKTYVLKTATGGYDGHGQKVIRSAEDLPFjAQQIANSAQCVLEEFVN 192 
35 ++V S+ L +T+ VLKT TGGYDGHGQ V+ + E L A+ L ++CVLE+F++ 

Sbjct: 124 QLVDSAEKLPET-VTRKQVIjKTTTG^YDGHGQVVlOTDEKLSAAKSLTELSEC^EDFIS 182 

Query: 193 FDLEISVIVSGNGQDVWFPVQF^IHRNNILSKTIVPARISDQIADKAKE^VQIAKKLQ 252 
F+ EISVI+SGNG + VFP+ EN HR NIL +TI PARIS ++ + A ++A IA+KL+ 
40 Sbjct: 183 FEREISVIISGNGHEYWFPLAENEHRENILHQTISPARISAEITENAYKIATSIAEKLE 242 

Query: 253 LSGTLCVEMFATAD-DIIVNEIAPRPHNSGHYSIEACDFSQFDTHILGVLGAPLPPIKLH 311 

LSG LCVEMF TAD I VNE+APRPHNSGH++IEACDF+QFD HI G+LG LP KL 
Sbjct: 243 LSGVLCVEMFLTADGQIYVNELAPRPHNSGHFTIEACDFNQFDLHIKGILGEDLPEPKLL 302 

45 

Query: 312 APAVMFNVLGQHVQQAIDHVAQNPSAHLHMYGKLEAKHNRKMGHVTV 358 

PA+M NVLGQHV+ ++ H H YGK +AKHNRKMGHVT+ 

Sbjct: 303 KPAIMLNVLGQHVEAVTCKLNHEHADWHQHDYGKADAKHNRKMGHVTI 349 

50 A related DNA sequence was identified in S.pyogenes <SEQ ID 905> which encodes the amino acid 
sequence <SEQ ID 906>. Analysis of this protein sequence reveals the following: 

Possible site: 34 

»> Seems to have no N-terminal signal sequence 



55 Final Results 
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bacterial cytoplasm Certainty=0 . 0334 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

5 An alignment of the GAS and GBS proteins is shown below: 

Identities = 344/369 (93%) , Positives = 353/369 (95%) 

Query: 1 MRNKEKSQRSQAMNSFKTIGIIGGGQLC3QMMAIAAIYMGHKVITLDPASDCPASRVSEVI 60 
MRNKEKSQRSQ +NSFKTIGIIGGGQLGQMMAIAMYMGHKVITLDPASD PASRVSEVI 
10 Sbjct: 1 MRNKEKSQRSQWNSFKTIGIIGGGQLGQMMAIAAIYMGHKVITLDPASDSPASRVSEVI 60 

Query: 61 VAPYDDVEALGTIAARCDVLTYEFENVDADGLDAWSAGQLPQGTDLLRISQNRIFEKDF 120 

VAPYDDVEALG LAARCDVLT YE FENVDADGLDA WSA QLPQGTDLLRISQNRI EKDF 
Sbjct: 61 VAPYDDVEALGQLAARCDVLTYEFENVDADGLDAWSACQLPQGTDLLRISQNRIVEKDF 120 

15 

Query: 121 LANKAGVTVAPYKVVTSSLDLEGLDLTKTYVLKTATGGYDGHGQKVIRSAEDLPEAQQLA 180 

IANKAGVTVAPYKWTSSLDL GLDLTKTYVLKT TGGYDGHGQK+ 1 RSAEDLPEAQQLA 
Sbjct: 121 IANKAGVTVAPYKWTSSLDLGGLDLTKTYVLKTETGGYDGHGQKIIRSAEDLPEAQQLA 180 

20 Query: 181 NSAQCVLEEFVNFDLEISVIVSGNGQDVTVFPVQENIHRNNILSKTIVPARISDQIADKA 240 

NSAQCVLEEFVNFDLEISVIVSGNG+DVTVFPVQENIHRNN1LSKTIVPAR1SDQLADKA 
Sbjct: 181 NSAQCVLEEFWFDLEISVIVSGNGKDVTVFPVQENIHRNNILSKTIVPARISDQLADKA 240 

Query: 241 KEMAVQIAKKLQLSGTLCVEMFATADDIIVNEIAPRPHNSGHYSIEACDFSQFDTHILGV 300 
25 K+ AVQIAKKLQLSGTLCVEMF TADDI IVNEIAPRPHNSG YSIEACDFSQFDTHILGV 

Sbjct: 241 KKTAVQIAKKLQLSGTLCVEMFTTADDI I VNEIAPRPHNSGRYS IEACDFSQFDTHILGV 300 

Query: 301 LGAPLPPIKLHAPAVMFNvLGQHVQQAIDHVAQNPSAHLH^GKLFAKHNRKMGHVTVFS 360 
LGAPLP I+LHAPAVM NVLGQHVQQA D+VA+NPSAHLHMYGKLEAKHNRKMGHVTVF+ 
30 Sbjct: 301 LGAPLPQIQLHAPAVMLNVLGQHVQQATDYVAKNPSAH^^ 360 

Query: 361 DVPDEVEEF 369 
DEV+EF 

Sbjct: 361 KDADEVKEF 369 

35 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 283 

A DNA sequence (GBSx0310) was identified in S.agalactiae <SEQ ID 907> which encodes the amino acid 
40 sequence <SEQ ID 908>. This protein is predicted to be phosphoribosylaminoimidazole carboxylase 
catalytic subunit (purE). Analysis of this protein sequence reveals the following: 

Possible site: 45 

>» Seems to have no N-terminal signal sequence 

45 Final Results 

bacterial cytoplasm Certainty=0. 3572 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

50 The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB12462 GB:Z99107 phosphoribosylaminoimidazole carboxylase I 
[Bacillus subtilis] 
Identities = 106/162 (65%) , Positives = 128/162 (78%) 

55 Query: 33 MQPIISIIMGSKSDWTTMQKTAEvLDNFGIAYEKKVVSAHRTPDLMFKHAEEARGRGIKI 92 

MQP++ IIMGS SDW TM+ ++LD + YEKKWSAHRTPD MF++AE AR RGIK+ 
Sbjct: 1 MQPLVGIIMGSTSDWETMKHACDILDEIoNVPYEKKWSAHRTPDFMFEYAETARERGIKV 60 



60 



Query: 93 IIAGAGGAAHLPGMVAAKTTLPVIGVPVKSRALSGLDSLYSIVQMPGGVPVATMAIGEAG 152 
IIAGAGGAAHLPGM AAKTTLPVIGVPV+S+AL+G+DSL SIVQMPGGVPVAT +IG+AG 
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Sbjct: 61 IIAGAGGAAHLPGMTAAKTTLPVIGVPVQSKMiNGMDSLLSIVQMPGGVPVATTSIGKAG 120 

Query: 153 ATNAALTALRILSIEDQNLADALAHFHEECGKIAEESSNELI 194 
A NA L A +ILS D++LA L E + ESS++L+ 
5 Sbjct: 121 AVNAGLLAAQILSAFDEDLARKLDERRENTKQTVLESSDQLV 162 

A related DNA sequence was identified in S.pyogenes <SEQ ID 909> which encodes the amino acid 
sequence <SEQ ID 910>. Analysis of this protein sequence reveals the following: 

Possible site: 57 
10 »> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -3.08 Transmembrane 36 - 52 ( 34 - 52) 

Final Results 

bacterial membrane Certainty=0. 2232 (Affirmative) < suco 

15 bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:CAA04375 GB:AJ000883 purE [Lactococcus lactis] 
20 Identities = 105/158 (66%) , Positives = 131/158 (82%) 

Query: 46 ISIIMGSKSDWATMQKTAEVLDNFGIAYEKKWSAHRTPDLMFKHAEEARGRGIKIIIAG 105 

++IIMG SDWATM+ +TA+ +LD+ FG+AYEKKWS AHRTP LM + + +AR RG K+IIAG 
Sbjct: 4 VAIIMGCSSDWATMKETAKILDDFGLA.YEKKWSAHRTPALMAEFSSQARERGYKVIIAG 63 

25 

Query: 106 AGGAAHLPGMVAAKTTLPVIGVPVKSRALSGLDSLYSIVQMPGGVPVATMAIGEAGATNA 165 

AGGAAHLPGMV+A+T +PVIGVP+KSRALSGLDSLYSIVQMP GVPVATMAIGEAGA NA 
Sbjct: 64 AGGZiAHLPGWSAQTLVPVIGVPIKSRALSGIJJSLYSIV^PAGVPVATMAIGERGAKNA 123 

30 Query: 166 ALTALRILSIEDQNLADALAHFHEEQGKIAEESSGELI 203 

AL AL++L+ ++NL L + ++ EES+ L+ 

Sbjct: 124 ALFALQLIANTNENLIQKLLVYRARAQEMVEESNKALL 161 

An alignment of the GAS and GBS proteins is shown below: 

35 Identities = 162/169 (95%) , Positives = 164/169 (96%) , Gaps = 1/169 (0%) 

Query: 27 PLYLNIMQ-PIISIIMGSKSDWTTMQKTAEVLDNFGIAYEKKVVSAHRTPDLMFKHAEEA 85 

PL + IM+ PIISIIMGSKSDW TMQKTAEVLDNFGIAYEKKWSAHRTPDLMFKHAEEA 
Sbjct: 35 PLCILIMKTPIISIIMGSKSDWATMQKTAEVLDNFGIAYEKKVVSAHRTPDLMFKHAEEA 94 

40 

Query: 86 RGRGI KI I IAGAGGAAHLPGMVAAKTTLPVIGVPVKSRALSGLDSLYS I VQMPGGVPVAT 145 

RGRGIKIIIAGAGGAAHLPGMVAAKTTLPVIGVPVKSRALSGLDSLYSIVQMPGGVPVAT 
Sbjct: 95 RGRGIKIIIAGAGGAAHLPGMVAAKTTLPVIGVPVKSRALSGLDSLYSIVQMPGGVPVAT 154 

45 Query: 146 MAIGEAGATNAALTALRILSIEDQNLADALAHFHEEQGKIAEESSNELI 194 

MAIGEAGATNAALTALRILSIEDQNLADALAHFHEEQGKIAEESS ELI 
Sbjct: 155 MAIGEAGATNAALTALRILSIEDQNLADALAHFHEEQGKIAEESSGELI 203 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
50 vaccines or diagnostics. 

Example 284 

A DNA sequence (GBSx0311) was identified in S.agalactiae <SEQ ID 911> which encodes the amino acid 
sequence <SEQ ID 912>. This protein is predicted to be phosphoribosylglycinamide synthetase (purD). 
Analysis of this protein sequence reveals the following: 

55 Possible site: 16 

»> Seems to have no N-terminal signal sequence 

Final Results 
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bacterial cytoplasm Certainty=0 . 1966 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAA04374 GB:AJ000883 purD [Lactococcus lactis] 
Identities = 236/419 (56%) , Positives = 298/419 (70%) , Gaps = 7/419 (1%) 

MKLLWGSGGREHAIAKKLIASKD VDQVFVAPGNDGMTLDGLDLVNIGI SEHSRLIDFVK 6 0 
MK+LV+GSGGREHA+AKK + S V++VFVAPGN GM DG+ +V+I + +L+ F + 
MKILVIGSGGREHALAKKFMESPQVEEVFVAPGNSGMEKDGIQIVHISELSNDKLVKFAQ 60 



I T +GP+ AL G+VD F A L FGP K AAELE SKDFAK IM KY VPTA 
NQNIGLTFVGPETALMNGWDAFIKAELPIFGPNKMAAELEGSKDFAKSIMKKYGVPTAD 12 0 



Y TF E A AY++E+G P+V+KADGLA GKGV VA +E A A ++ F S 



Query: 


1 


Sb j ct : 


1 


Query: 


61 


Sbjct: 


61 


Query: 


121 


Sbjct: 


121 


Query: 


181 


Sb j ct : 


176 


Query: 


241 


Sb j ct : 


236 


Query: 


301 


Sb j ct : 


296 


Query: 


361 


Sb j ct : 


355 



+WIEEFLDGEEFSLF+F + K Y MP AQDHKRA+D DKG NTGGMGAY+PV H+ + 



W+ A+E +VKP + GMI EG+ + GVLYAGLILT DG K IEFN+RFGDPETQ++LPR 



L SD AQ I DI+ G EP + W + GVTLGVWA+EGYP + G+ LPE +G + YY 



AG , EN++ L+S+GGRVY++ T + VK+ Q +Y +L + + G FYR+DIGS+AI 
AGVSKNENNQ-LISSGGRVYLVSETGEDVKSTQKLLYEKLDKLENDGFFYRHDIGSRAI 412 

A related DNA sequence was identified in S. pyogenes <SEQ ID 913> which encodes the amino 
sequence <SEQ ID 914>. Analysis of this protein sequence reveals the following: 

Possible site: 35 
>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.80 Transmembrane 5 - 21 ( 5-21) 



Final Results 

bacterial membrane Certainty=0 . 1319 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:CAA04374 GB:AJ000883 purD [Lactococcus lactis] 
Identities = 236/419 (56%) , Positives = 301/419 (71%) , Gaps = 7/419 (1%) 

Query: 50 LKLLWGSGGREHAIAKKLLASKGVDQVFVAPGNDGMTLDGLDLVNIWSEHSRLIAFAK 109 

+K+LV+GSGGREHA+AKK + S V++VFVAPGN GM DG+ +V+I + +L+ FA+ 
Sbjct: 1 MKILVIGSGGREHALAKKFMESPQVEEVFVAPGNSGMEKDGIQIVHISELSNDKLVKFAQ 60 

Query: 110 ENEISWAFIGPDDALAAGI VDDFNSAGLRAFGPTKAARELEWSKDFAKEIMVKYNVPTAA 169 

I F+GP+ AL G+VD F A L FGP K AAELE SKDFAK IM KY VPTA 
Sbjct: 61 NQNIGLTFVGPETALMNGVVDAFIKAELPIFGPNKMAAELEGSKDFAKSIMKKYGVPTAD 120 

Query. 170 YGTFSDFEKAKAYIEECGAPIWKADGIjALGKGVVVAETVEQAVEAAQEMLLDNKFGDSG 229 

Y TF E A AY++E+G P+V+KADGLA GKGV VA +E A A ++ F S 

Sbjct: 121 YATFDSLEPALAYLDEKGVPLVIKADGLAAGKGVTVAFDIETAKSALADI FSGSQ 175 



Query: 230 ARWIEEFLDGEEFSLFAFANGDKFYIMPTAQDHKRAFDGDKGPNTGGMGAYAPVPHLPQ 289 
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+WIEEFLDGEEFSLF+F + K Y MP AQDHKRAFD DKGPNTGGMGAY+PV H+ + 
Sbjct: 176 GKWIEEFLDGEEFSLFSFIHDGKIYPMPIAQDHKRAFDEDKGPNTGGMGAYSPVLHISK 235 

Query: 290 SVVDTAVEMIWPVLEG^nrM:GRPYLGTOYVGLILTADGPKVIEFNSRFGDPETQIII ) PR 349 
5 W+ A+E +V+P + GM+ EG+ +■ GVLY GIiILT DG K IEFN+RFGDPETQ++LPR 

Sbjct: 236 EVVNEALEKOTKPWAGMIEEGKSFTGVLYAGLILTEDGVKTIEFKARFGDPETQVVIjPR 295 

Query: 350 LTSDFAQNIDDIMMGIEPYITWQKDGVTLGVWASEGYPFDYEKGVPLPEKTDGDIITYY 409 
L SD AQ I DI+ G EP + W + GVTLGWVA+EGYP + G+ LPE +G + YY 
10 Sbjct: 296 LKSDIAQAIIDIIAGNEPTLEWLESGVTLGVVVAAEGYPSQAKLGLILPEIPEG-i^WYY 354 

Query: 410 AGVKFSENSELLLSNGGRVYMLVTTEDSVKAGQDKIYTQIjAQQDTTGLFYRNDIGSKS.1 468 

AGV +EN++ L+S+GGRVY++ T + VK+ Q +Y +Ii + + G FYR+DIGS+AI 
Sbjct: 355 AGVSKNENNQ-LISSGGRVYLVSETGEDVKSTQKLLYEKLDKLENDGFFYRHDIGSRAI 412 

15 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 399/421 (94%), Positives = 408/421 (96%) 

Query: 1 MKLLWGSGGREHAIAKKLLASKDVDQVFVAPGNDGMTLDGLDLVNIGISEHSRLIDFVK 60 
20 +KLLWGSGGREHMAKKL1ASK VDQVFVAPGNDGMTLDGLDLVNI +SEHSRLI F K 

Sbjct: 50 LKLLWGSGGREHAIAKKLLASKGVDQVFVAPGNDGMTLDGLDLVNIWSEHSRLIAFAK 109 

Query: 61 EiffilAWTIiIGPDDAIAAGIVDGFNSAGLRAFGPTKAA^LEWSKDFAKEIlWKYNVPTAA 120 
ENEI+W IGPDDALAAGIVD FNSAGLFJ^GPTKAAAELEWSKDFAKEIMVKYNVPTAA 
25 SbjCt: 110 ENEISWAFIGPDDAIAAGIVDDFNSAGLRAFGPTKAARELEWSKDFAKEXMVKYWPTAA 169 

Query. 121 YGTFSDFEKAKAYIEEQGAPIWKADGlMiGKG\AnTAETVEQAVFAAQEMLLDMKFGDSG 180 

YGTFSDFEKAKRYIEEQ^^PIWKADGLALGKGVWAETVEQAVEAAQEMLLDNKFGDSG 
Sbjct: 170 YGTFSDFEKAKAYIEEQ£APIVVKArX3LALGKGV^ 229 

30 

Query: 181 ARWIEEFLDGEEFSLFAFANGDKFYIMPTAQDHKRAYDGDKGLNTGGMGAYAPVPHLPQ 240 

ARWIEEFLDGEEFSLFAFANGDKFYIMPTAQDHKRA+DGDKG NTGGMGAYAPVPHLPQ 
Sbjct: 230 ARWIEEFLIX3EEFSLFAFANGDKFYIMPTAQriHKRAFDGDRGPNTGGMGAYAPVPHIjPQ 289 

35 Query: 241 SVVDTAVETIVKPVLEGMlAEGRPYLGVLYAGrjILTADGPJCVIEFNSRFGDPETQIILPR 300 

SWDTAVE IV+PVLEGM+AEGRPYLGVLY GLILTADGPKVIEFNSRFGDPETQIILPR 
Sbjct: 290 SWDTAVEMIVRPVLEGMVAEGRPYLGVLYVGLILTADGPKV1EFNSRFGDPETQIILPR 349 

Query: 301 LTSDFAQNIDDIMMGlEPYITWQKDGVTLGVWASEGyPLDYEKGVPLPEKTDGDIITYY 360 
40 LTSDFAQNIDDIMMGIEPYITWQKDGVTLGVWASEGYP DYEKGVPLPEKTDGD I ITYY 

Sbjct: 350 LTSDFAQNIDDIMMG1EPYITWQKDGVTLGVWASEGYPFDYEKGVPLPEKTDGDIITYY 409 

Query: 361 AGAKFAENSKALLSNGGRVYMLVTTEDSVKAGQDKIYTQLAQQDTTGLFYRNDIGSKAIKE 421 
AG 5CF+ENS+ LLSNGGRVYMLVTTEDSVKAGQDKIYTQLAQQDTTGLFYRNDIGSKAI+E 
45 SbjCt: 410 AGVKFSENSELLDSNGGRVYMLVTTEDSVKAGQDKIYTQLAQQDTTGLFYRNDIGSKAIRE 470 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 285 

50 A DNA sequence (GBSx0312) was identified in S.agalactiae <SEQ ID 915> which encodes the amino acid 
sequence <SEQ ID 916>. Analysis of this protein sequence reveals the following: 
Possible site: 36 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.28 Transmembrane 235 - 251 ( 235 - 251) 

55 



60 



Final Results 

bacterial membrane Certainty=0 .1510 (Affirmative) < suco 

bacterial outside Certainty=0.0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 
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>GP:AAA23257 GB:M81878 unknown [Clostridium perfringens] 
Identities = 66/258 (25%) , Positives = 119/258 (45%) , Gaps = 9/258 (3%) 

Query: 1 MTIYDQIESALDLMTDLEREIACTFMGQPISKDALASTIOTKQLHISQAALTRFAKKCGF 60 
5 MI +Q+E+ T E+ + Y + + +1+ K+ + +A +TRF KK GF 

Sbjct: 1 MGILEQLENPKFKATKSEKTLIEYIKSDLDNIIYKSISIIAKESGVGEATITRFTKKLGF 60 

Query: 61 KGYREFVFEYLKS-HETISQQLYGLQNDNTKKVFMNYQEMISKSADI IDEEQL 112 

G+++F K + + L + V +M+ S +1 ID + + 

10 Sbjct: 61 NGFQDFKVTLAKEISNKKNTSIINLHVHRDESVTETANKMLKSSINILEQWKQIDLDLM 120 

Query: 113 LEVSHMIEQADRWFYGKGSSSLVAKEFKIRLMRLGVICFALDDTDSFSVWNSIVNDRCL 172 

+ +1 A RVYF G G S + A + + MR+G + D+ + +SI ND + 

Sbjct: 121 CKCRDLI^AKRWFIGIGYSGIAATDINYKFMRIGFTTVPVTDSHTMVIMSSITNDDDV 180 

15 

Query: 173 VIAFSLSGNTNSVIGALKIASCHGAKTVLFTK- QPHTIDYAFDKI IQVASARHLDYGNRI 231 

++A S SG T VI +K A +G K + T+ + + D + SA + I 
Sbjct: 181 IVAISNSGTTKEVIKTVKQAKENGTKIITLTEDSDNPLRKLSDYELTYTSAETIFETGSI 240 

20 Query: 232 SPQIPMLIMVDIIYAQFL 249 

S +IP + ++D++Y + + 
Sbjct: 241 SSKI PQI FLLDLLYTEVI 258 

A related DNA sequence was identified in S.pyogenes <SEQ ID 917> which encodes the amino acid 
25 sequence <SEQ ID 918>. Analysis of this protein sequence reveals the following: 

Possible site: 60 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -4.88 Transmembrane 243 - 259 ( 242 - 261) 

30 Final Results 

bacterial membrane Certainty=0. 2954 (Affirmative) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 
bacterial cytoplasm — Certainty=0.0000 (Not Clear) < suco 

35 A related sequence was also identified <SEQ ID 9093> which encodes the amino acid sequence <SEQ ID 
9094>, Analysis of this protein sequence reveals the following: 

Possible cleavage site: 56 
»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -4.88 Transmembrane 239 - 255 ( 238 - 257) 

40 

Final Results 

bacterial membrane Certainty= 0 . 295 (Affirmative) < suco 

bacterial outside Certainty= 0.000 (Not Clear) < suco 

bacterial cytoplasm Certainty= 0.000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 138/263 (52%) , Positives = 189/263 (71%) , Gaps = 2/263 (0%) 

Query: 6 QIESALDLMTDLERE I ACYFMGQP I SKDALASTIVTKQLH I SQAALTRFAKKCGFKGYRE 65 
50 +IE++L+ MT LE+ IA +F+ ++ L ++ + K+LHI SQAALTRFAKKCGF GYR 

Sbjct: 14 KIEASLEHMTSLEKGIAHFFITTDLTPQELTASEIVKRLHISQAALTRFAKKCGFTGYRA 73 

Query: 66 FVFEYLKSHETISQQLYGLQNDNTKKVFMNYQEMISKSADIIDEEQLLEVSHMIEQADRV 125 
F F+YL S + + + + TK+V M+Y +I+K+ ++++EE+LL ++ +1+ ++RV 
55 Sbjct: 74 FAFDYLHSLQESQETFQSIHLELTKKVLMDYDALINKTYELVNEEKLLNLAKLIDSSERV 133 

Query: 126 YFYGKGSSSLVAKEFKIRLMRLGVICFJ^DDTDSFSWTNSIVNDRCLVIAFSLSGNTNSV 185 

YF+GKGSS LVA+E K+R MRLG+IC+A DTD F+W NS+VN+ CLV FSLSG TNSV 
Sbjct: 134 YFFGKGSSGLVAREMKLRF^LGLIOIAYSDTDGFTWANSLVNENCLVFGFSLSGKTNSV 193 



45 



60 



Query: 186 IGALKIASCHGAKTVLFTKQPHT-IDYAFDKIIQVASARHLDYGNRISPQIPMLIMVDII 244 

I AL AS GAKTVL T T D + D II V+S L YGNR+SPQ P+LIM+DII 
Sbjct: 194 ITALHQASQRGAKTVLLTTDNQTEFDDSLD-IIPVSSTHQLHYGNRVSPQFPLLIMMDII 252 
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Query: 245 YAQFLDINKIEKERI FRETI IQR 267 

YA L I+K KE+IF+ Til + 
Sbjct: 253 YAYVLAIDKPHKEKI FKNTI IDK 275 

5 

SEQ ID 916 (GBS320) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 40 (lane 5; MW 33kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 85 (lane 7; MW 58kDa) and in Figure 
160 (lane 7 & 8; MW 58kDa). 

1 0 GBS320-GST was purified as shown in Figure 224, lane 3-4. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 286 

A DNA sequence (GBSx0313) was identified in S.agalactiae <SEQ ID 919> which encodes the amino acid 
15 sequence <SEQ ID 920>. This protein is predicted to be xylan esterase 1 (cephalosporin-C). Analysis of this 
protein sequence reveals the following: 

Possible site: 54 

>>> Seems to have no N-terminal signal sequence 

20 Final Results 

bacterial cytoplasm Certainty=0. 4981 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

25 The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAB68821 GB:AF001926 xylan esterase 1 [Thermoanaerobacterium sp. 
'JW/SL YS485'] 

Identities = 133/299 (44%), Positives = 188/299 (62%), Gaps = 1/299 (0%) 

30 Query: 5 MSLDDMREYLGQDQIPEDFDDFWKKQTMKYQG-NIEYRLDKKDFNITFAQAYDLHFKGSN 63 

M L +REY G + PEDFD++W + ++++L+F ++FA+ YDL+F G 
Sbjct: 6 MPLQKLREYTGTNPCPEDFDEYWNRALDEMRSVDPKIELKESSFQVSFAECYDLYFTGVR 65 

Query: 64 NSIVYAKCLFPKTNKPYPWFYFHGYQNQSPDWSDQLNYVAAGYGWSMDVRGQAGQSQD 123 
35 + ++AK + PKT +P + FHGY + S DW+D+!LNYVAAG+ W+MDVRGQ GQSQD 

Sbjct: 66 GARIHAKYIKPKTEGKHPALIRFHGYSSNSGDWNDKI^rrVAAGFTWAMDVRGQGGQSQD 125 

Query: 124 KGHFDGITVKGQIVRGMISGPNHLFYKDIYLDVFQLIDIIATLESVDSNQLYSYGWSQGG 183 
G G T+ G I+RG+ +++ ++ I+LD QL 1+ + VD +++ G SQGG 
40 Sbjct: 126 VGGOTGNTLNGHIIRGLDDDADNMLFRHIFIiDTAQIAGIVMNMPEVDEDRVGVMGPSQGG 185 

Query: 184 AIALIAAAI^PKIVKTVAVYPFLSDFRRvLDLGGVSEPYDELFRYFKYSDPFHKTENNvIj 243 

L+L AAL P++ K V+ YPFLSD++RV DL Y E+ YF+ DP H+ EN V 

Sbjct: 186 GLSIACAALEPRWKOTSEYPFLSDYKRVWDLDLAKNAYQEITDYFRLFDPRHERENEVF 245 

45 

Query: 244 KTIAYIDVKNFAHRISCPVVLLTALKDDICPPSTQFAIFNRLTSTKKHLLLPDYGHDPM 302 

L YIDVKN A RI V++ L D +CPPST FA +N + S K + PDYGH+PM 
Sbjct: 246 TKLGYIDvTOOjAKRIKGDVLMCTGL^QVCPPSTVFAAYNNIQSKKDIKVYPDYGHEPM 304 

50 No corresponding DNA sequence was identified in S. pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 287 

A DNA sequence (GBSx0314) was identified in S.agalactiae <SEQ ID 921 > which encodes the amino acid 
sequence <SEQ ID 922>. Analysis of this protein sequence reveals the following: 

Possible site: 35 
5 >>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -5.73 Transmembrane 128 - 144 ( 126 - 145) 

Final Results 

bacterial membrane Certainty=0. 3293 (Affirmative) < suco 

10 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAA23256 GB:M81878 unknown [Clostridium perfringens] 
15 Identities = 78/160 (48%) , Positives = 110/160 (68%) 

Query: 131 CLTIGTGIGGCLIIDKTVFHGFSNSACEVGYMHLSDGDFQDIASTTALIADVAKAHGDEI 190 

CLTIGTGIGG LI ID V HGFSNSA E+GYM ++ + QD+AS +AL+ +VA G E 
Sbjct: 18 CLTIGTGIGGALIIDGKVLHGFSNSAGEIGy^lMVNGENIQDIASASALVKNVALRKGvEP 77 

20 

Query: 191 SRWDGRRIFQEAKKGNEKCIASIDRMINYLGQGIANMVYVWPEKVVLGGGIMAQKDYLQ 250 

S DGR + + G+ C ++++ + L GI+N+VY++NPE WLGGGIMA+++ + 
Sbjct: 78 SSIDGRYvLDNYENGDLICKEEVEKLADNLAI^ISNIWLINPEVvVLGGGIMAREEVFR 137 

25 Query: 251 DKLSESLKRNLVTSLAEKTAIVFAQHENQAGMLGAYYHFK 290 

+ SL++ L+ S+ T I FA+ +N AGM GAYY+FK 
Sbjct: 138 PLIENSLRKYLIESVYNNTKIAFAKLKNTAGMKGAYYNFK 177 

A related DNA sequence was identified in S.pyogenes <SEQ ID 923> which encodes the amino acid 
30 sequence <SEQ ID 924>. Analysis of this protein sequence reveals the following: 

Possible site: 22 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -4.30 Transmembrane 128 - 144 ( 127 - 145) 
35 INTEGRAL Likelihood = -0.11 Transmembrane 227 - 243 ( 227 - 243) 

Final Results 

bacterial membrane Certainty=0. 2720 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

40 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:BAB04516 GB:AP001509 glucose kinase [Bacillus halodurans] 
Identities = 97/291 (33%) , Positives = 155/291 (52%) , Gaps = 14/291 (4%) 

Query: 5 LAID IGGTAI KYGL I SETGDLLEKEEMATEAYKGGPS I LEKVKGLVKTYQDQMDLAGVAI 64 

+ ID+GGT IK L+S+ G+++ +E TEA +G ++ K+ L + D AG+ I 
Sbjct: 3 VGIDLGGTKIKAALVSDAGEIISVQECPTEAAQGPEEVMNKMMSLTEKVTDHQPFAGIGI 62 

50 Query: 65 SSAGMVNPDEGEIFYAGPQIPNYAGTQFKKEIEETFGLPCEVENDVNCAGLAEAISGSAK 124 

+ G ++ EG I + P +P + +E F P +++ND N A LAEA+ GS + 

Sbjct: 63 GAPGPLSSTEGTIL-SPPNLPGKJDHIHLVDRFQEQFQCPVKLDNDAWAALAEALLGSGQ 121 

Query: 125 DYPVALCLTIGTGIGGCLLFNSQVFHGSSHSACEVG YLHLSDGQFQDLAS 174 

55 + LTI TGIGG + + + HG+S A E+G + +L+ G + LAS 

Sbjct: 122 GFTSVFYLTISTGIGGGYVLDGSIVHGASDYAGEIGNMIVQPNGYQHANLNPGSLEGLAS 181 

Query: 175 TTALVQEWLAYGDDISQWDGRRIFEQAKAGDAICIAAISKQVDYLGQGIANICYVVNPN 234 
TA+ + +G + R +F+Q + GD + + +DYL GIANI + +NP+ 

60 Sbjct: 182 GTAIGRMARERFG VEGGTREVFDQIRRGDHDMQRLVEEAMDYLAIGIANIAHTINPD 238 



45 
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Query: 235 VWLGGGIMAQKDYIiADKLKTALDS YLVS S LAKKTQLKFASHGNNAGI LGA 285 

V VLGGG+M D + +K + YL LA+ T + A G ++G+LGA 
Sbjct: 239 VFVLGGGVMNADDLILPIVKEKVSRYLYPGLAQSTTIVKAKLGGDSGVLGA 289 

5 An alignment of the GAS and GBS proteins is shown below: 

Identities = 192/292 (65%) , Positives = 237/292 (80%) 

Query: 1 MTRTVAIDIGGTMIKHGIVDNLGCIVEASELATEAYKGGPGILQKVCQIIDNYLAEGSID 60 
M +AIDIGGT IK+G++ G ++E E+ATEAYKGGP IL+KV ++ Y + + 
10 Sbjct: 1 MKHYLAIDIGGTAIKYGLISETGDLLEKEEMATEAYKGGPSILEKVKGLVKTYQDQMDLA 60 

Query: 61 GIAISSAGMVDPDEGCIFYSGPQIPNYAGTQFKKVLEDTYQVRTEIENDVNCAGLAEAVS 120 

G+AISSAGMV+PDEG I FY+GPQI PNYAGTQFKK +E+T+ + E+ENDVNCAGLAEA+S 
Sbjct: 61 GVAISSAGMVNPDEGEIFYAGPQIPNYAGTQFKKEIEETFGLPCEVENDVNCAGLAEAIS 120 

15 

Query: 121 GSAKDSSIALCLTIGTGIGGCLIIDKTVFHGFSNSACEVGYMHLSDGDFQDLASTTALIA 180 

GSAKD +ALCLTIGTGIGGCL+ + VFHG S+SACEVGY+HLSDG FQDLASTTAL+ 
Sbjct: 121 GSAKDYPVALCLTIGTGIGGCLLFNSQVFHGSSHSACEVGYLHLSDGQFQDLASTTALVQ 180 

20 Query: 181 DVAKAHGDEISRWDGRRIFQFAKKGlffiKCIASIDRMINYLGQGIANMVYVVNPEKVVLGG 240 

+V A+GD+IS+WDGRRIF++AK G+ CIA+I + ++YLGQGIAN+ YWNP WLGG 
Sbjct: 181 EVVIAYGDDISQWDGRRIFEQAKAGDAICIAAISKQVDYLGQGIANICYVVNPNVVVLGG 240 

Query: 241 GIMAQKDYLQDKLSESLKRNLVTSLAEKTAIVFAQHENQAGMLGAYYHFKNR 292 
25 GIMAQKDYL DKL +L LV+SLA+KT + FA H N AG+LGAYYHFK + 

Sbjct: 241 GIMAQKDYLADKLKTALDSYLVSSLAKKTQLKFASHGNNAGILGAYYHFKQK 292 

SEQ ID 922 (GBS331) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 60 (lane 2; MW 35.9kDa). It was also expressed in E.coli as a GST-fusion 
30 product. SDS-PAGE analysis of total cell extract is shown in Figure 67 (lane 3; MW 61kDa). 

The GBS331-GST fusion product was purified (Figure 209, lane 3) and used to immunise mice. The 
resulting antiserum was used for FACS (Figure 309), which confirmed that the protein is immunoaccessible 
on GBS bacteria. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
35 vaccines or diagnostics. 

Example 288 

A DNA sequence (GBSx0315) was identified in S.agalactiae <SEQ ID 925> which encodes the amino acid 
sequence <SEQ ID 926>. This protein is predicted to be a acylneuraminate lyase (nanA). Analysis of this 
protein sequence reveals the following: 

40 Possible site: 18 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0894 (Affirmative) < suco 

45 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAA69950 GB:Y08695 putative acylneuraminate lyase [Clostridium 
50 tertium] 

Identities = 162/225 (72%) , Positives = 191/225 (84%) 

Query: 1 MKDLQKYQGIIPAFYACYDDKGDICPERVKALOMYFIDKGVQGI.YVNGSSGECIYQSVAD 60 
M++L+KY+GI I PAFYACYDD+G I PER + T Y IDKGV+GLYV GSSGECIYQS + 
55 Sbjct: 1 MRNLEKYKGIIPAFYACYDDEGKISPERTQMFTQYLIDKGVKGLYVCGSSGECIYQSKEE 60 
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Query: 61 RKLVLENVMSVAKGKLWIAHVACNNTKDSVELM1HAEAIGVDAIAAIPPIYFRLPEYAI 120 

RK+ LENVM VAKGK+T+IAHV CNNT+DS ELA HAE+IGVDAIA+IPPIYF LP+Y+I 
Sbjct: 61 RKITLENVMKVAKGKITIIAHVGCMNTRDSEELAEHAESIGVDAIASIPPIYFHLPDYSI 120 

Query: 121 ADYWNTISQAA.PQTDFIIYNIPQLAGVALTSDLYRKMLQNPQVIGVKNSSMPVQDIQNFV 180 

A+YWN IS AAP TDFI I YNI PQLAGV L +LY++ML+NP+VIGVKNSSMPVQDIQ F 
Sbjct: 121 AEYWNDI SNAAPNTDFI 1 YNI PQLAGVGLGINLYKQMLKNPRVIGVKNSSMPVQDIQMFK 180 

Query: 181 AIGGENHIVFNGPDEQFLGGRLMGAAAGIGGTYGVMPELYLTIiNQ 225 

I G+ +VFNGPDEQF+ GR+MGA GIGGTY VMPEL+L ++ 
Sbjct: 181 DISGDESWFNGPDEQFVAGRIMGADGGIGGTYAVMPELFIAADK 225 

A related DNA sequence was identified in S.pyogenes <SEQ ID 927> which encodes the amino acid 
sequence <SEQ ID 928>. Analysis of this protein sequence reveals the following: 

Possible site: 15 

>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0981 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 238/304 (78%) , Positives = 263/304 (86%) 



Query: 


1 


MKDLQKYQGIIPAFYACYDDKGDICPERvKALTNYFIDKGVQGLYVNGSSGECIYQSVAD 


60 






M DL KYQGI I PAFYACYDD+G+ I PERV+ALT Y+IDKGVQGLY+NGSSGECIYQSV D 




Sb j ct : 


1 


MTDLTKYQGI I PAFYACYDDQGNISPERVRALTQYYIDKGVQGLYINGSSGECIYQSVFD 


60 


Query: 


61 


RKLVLENVMSVAKGKLWIAWACNOTKDSvELfi^ 


120 






R+LVLENVM+VAKGKLT+ I HVACNNTKDS+EIA H+E +GVDAIAAIPPIYFRLPEYA+ 




Sb j ct : 


61 


RQLVLENVMAVAKGKLTI INHVACNNTKDSIELAAHSERLG VDAIAAI PPIYFRLPEYAV 


120 


Query: 


121 


ADYWNTISQAAPQTDFIIYNIPQLAGVALTSDIiYRKMLQNPQVIGVIOSISSMPVQDIQNFV 


180 






ADYWN IS AAP TDFIIYNIPQLAGVALT LY+ ML N +VIGVKNSSMPVQDIQ F 




Sb j ct : 


121 


ADYWNAISSAAPHTDFIIYNIPQLAGVALTPSLYKTMLANKRVIGVKNSSMPVQDIQTFC 


180 


Query: 


181 


AIGGENHIVFNGPDEQFLGGRLMGAARGIGGTYGVMPELYLTLNQLIVDKDLEKARELQF 


240 






AIGG++HIVFNGPDEQFLGGRLMGAAAGIGGTYG MPEL+L LNQLI DKDLEKA+ LQ+ 




Sb j ct : 


181 


AIGGDDHIVFNGPDEQFLGGRLMGAAAGIGGTYGAMPELFLRIjNQLIADKDLEKAKALQY 


240 


Query: 


241 


TINDIITKLCSGHGNMYAVIKAVIiEINEQLTIGSTOLPIjASVTEEDKPIIKEAAEMIRHA 


300 






TIN+II L S HGNMY VIK VL INE L IGSVR PLA + EED+ I + AA +1 A 




Sb j ct : 


241 


TINEIIGVLVSAHGKf^GVIKEVLRINEGLDIGSVRSPLAELVEEDRVICQRAAALINQA 


300 


Query: 


301 


KKQF 304 








K+ F 




Sb j ct : 


301 


KETF 304 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 289 

A DNA sequence (GBSx0317) was identified in S.agalactiae <SEQ ID 929> which encodes the amino acid 
sequence <SEQ ID 930>. Analysis of this protein sequence reveals the following: 

Possible site: 46 

>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -9.45 Transmembrane 82 - 98 ( 79 - 111) 
INTEGRAL Likelihood = -6.85 Transmembrane 24 - 40 ( 21 - 52) 
INTEGRAL Likelihood = -5.26 Transmembrane .180 - 196 ( 172 - 200) 
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INTEGRAL Likelihood = -5.10 Transmembrane 160 - 176 ( 158 - 179) 
INTEGRAL Likelihood = -4.35 Transmembrane 110 - 126 ( 106 - 130) 



Final Results 

bacterial membrane Certainty=0. 4779 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB05827 GB:AP001514 unknown conserved protein in B. subtilis 
[Bacillus halodurans] 
Identities = 40/148 (27%) , Positives = 74/148 (49%) , Gaps = 4/148 (2%) 

Query: 14 YNNPFMQGCNWFDLALLNLLFMI -TCLPLVTIG- -AAKISLYRTLWQKLEGD-QTNLLI 69 

+++ F Q C+ ++ LA +NLL++ T L LV +G A +++ L + G+ + 
Sbjct: 6 MSSRFYQTCDWIWKLAYINLLWLSGTLMLVVLGFLPATTAMFTVLRKWFTGNPDVAITR 65 

Query: 70 LYIKHLKKEWFQGMLLGLVELSILWIIFDLTILHYQIGFIVSFLKITCYAFLLLTVMTS 129 

+ + K E+ + LLG V L ++ F+ L G + L + YAFL+L ++T 
Sbjct: 66 TFFQAYKNEFLKINLLGAVLLLGAYILYFNYIWLGTVEGTVHMVLSLGWYAFLILYIITL 125 

Query: 130 IYLFPMAARYEMSLLDTVKKSFIMACLN 157 

Y+ P Y + L +K + 1+ +N 
Sbjct: 126 FYI IPAYVHYNLKLFQYIKTALI IGFVN 153 

A related DNA sequence was identified in S. pyogenes <SEQ ID 93 1> which encodes the amino 

sequence <SEQ ID 932>. Analysis of this protein sequence reveals the following: 

Possible site: 24 
>>> Seems to have no N-terminal signal sequence 
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Final Results 

bacterial membrane Certainty=0 . 6944 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:BAB05582 GB:AP001513 unknown conserved protein in bacilli 
[Bacillus halodurans] 
Identities = 59/194 (30%) , Positives = 93/194 (47%) , Gaps = 11/194 (5%) 

Query: 17 SKWMRASAALFDLLVFNLLFVL-SCLPLLTIGV--AKMALYASLLDWREGQVS-QLVTTY 72 

+K M+ + L+ NLL++L S + + +GV A +L+A W + + L TY 
Sbjct: 8 TKIMKLFEWIMRLVYLNLLWLLFSFIGGIILGVMPATASLFAVFRKWYQKEDDFPLFQTY 67 

Query: 73 SSHFKYYFKSGLRLGLIELGIMTICLLDLFLIRNQSGLVFQGFKVLCVAVLFLWILFLY 132 

+ FK FK +GL +1 I LD+ L+ S + Q + A+ F+ ++ LY 
Sbjct: 68 LNEFKRSFKIANLVGLTLVLIGGILYLDVLLLLGTSHWIGQLLLMGVGALSFIYLVTLLY 127 

Query: 133 AYPQAVKRDLSLSTLFKRSFLLAGLFFPWSFAFLAFICLTIFSLQL SLLTLFGGVS 188 

+P V DLS FK SFLL G+ P+ LI L++ +L LL LF S 

Sbjct: 128 IFPTLVHFDLSYKQYFKHSFLL-GVLQPFR-TLLLMITLSLSALLFLTFPILLPLF-AAS 184 

Query: 189 LLAIIGISSLTYLY 202 

+A + + S + Y 
Sbjct: 185 FMAALTMWSFLFGY 198 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 68/210 (32%) , Positives = 117/210 (55%) 
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Query: 3 K^QLIAAIFDVMPFMQGCNWFDLMjLNLLFMITCLPLVTIGAAKISLYRTLWQKLEG 62 

K L+ ++F +++ +M+ +FDL + NLLF+++CLPL+TIG AK++LY +L EG 
Sbjct: 4 KKQGLLHSLFKLDSKWMRASAALFDLLVFNLLFVLSCLPLLTIGVAKMALYASLLDWREG 63 

5 

Query: 63 DQTNLLILYIKHLKKEWFQGMLLGLVELSILWIIFDLTILHYQIGFIVSFLKITCYAFL 122 

+ L+ Y H K + G+ LGL+EL 1+ + + DL ++ Q G + K+ C A L 
Sbjct: 64 QVSQLOTTYSSHFKYYFKSGLRLGLIELGIMTICLLDLFLIRNQSGLVFQGFKVLCVAVL 123 

10 Query: 123 LLTVMTSIYLFPMAARYEMSLLDTVKKSFIMACLNLKWTGVLMFLLIMTWFIMVQSSLLF 182 

L V+ +Y +P A + ++SL K+SF++A L W+ +++TF+SL 
Sbjct: 124 FLWILFLYAYPQAVKRDLSLSTLFKRSFLLAGLFFPWSFAFLAFI CLTIFSLQLSLLTL 183 

Query: 183 MLTVSAIFIFAYTAFAYFKI I ILQKQFAYF 212 
15 VS + I ++ Y +II++ F 

Sbjct: 184 FGGVSLLAI IGISSLTYLYLI IMESLLRRF 213 

A related GBS gene <SEQ ID 8535> and protein <SEQ ID 8536> were also identified. Analysis of this 
protein sequence reveals the following: 

20 Lipop: Possible site: -1 Crend: 2 

McG: Discrim Score: 3.27 
GvH: Signal Score (-7.5) : -4.23 

Possible site: 46 
»> Seems to have an uncleavable N-term signal seq 
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modified ALOM score: 2.39 
*** Reasoning Step: 3 



Final Results 

bacterial membrane Certaxnty=0. 4779 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 



ORF00072(364 - 828 of 1260) 

EGAD|l08353 |BS3003 (14 - 171 of 222) hypothetical protein {Bacillus subtilis} 
OMNI |NT01BS3507 conserved hypothetical protein GP | 2635493 | emb | CAB14987 . 1 | | 299119 similar to 
45 hypothetical proteins from B. subtilis {Bacillus subtilis} 

GPl2293197|gb|AAC00275.l| |AF008220 YteU {Bacillus subtilis} PIR|D69991 | D69991 conserved 
hypothetical protein yteU - Bacillus subtilis 
%Match =5.9 

%Identity =26.6 %Similarity = 50.6 
50 Matches = 42 Mismatches = 74 Conservative Sub.s = 38 

270 300 330 360 390 417 441 471 

IMSKKGY*KC*WRKKYREYIVKKANQLIAAIFDVNNPFMCGGIVVF 

I = =1 III- III I I =1= = 
5 5 MEHDGSLGRMLRFCEWIMRFAYTNLLWLFFTLLGLGVFGIMPATAALFAVMR 

10 20 30 40 50 

498 528 558 588 618 648 678 708 

QKLEG-DQTNLLILYIKHLKKEWFQGMLLGLVELSILWIIFDLTILHYQIGFIVSFLKITCYAFLLLTVMTSIYLFPMA 
60 : ::| | :| : : | |:|: ||| | | |:| || :: | |:: |: :| | |:||: 

KWIQGQDNVPVLKTFWQEYKGEFFRSNLLGAVLALIGVIIYIDLALI-YPSHFLLHILRFAIMIFGFLFVSMLFYVFPLL 
70 80 90 100 110 120 130 



738 768 798 828 858 888 918 948 

65 ARYEMSLLDTVKKSFIMACIiNLKMTGvLMFLLIMTWFIMVQSSLLFMLTVSAIFIFAYTAFAYFKIIILQKQFAYFSKQQ 
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|| |:::: |::| :: | : :|:: 
VHFDWKIQlLYWFSLLLSVAYLQYTLTMLALWALFFLriAYLPGIVPFFSVSLISYCHMRIVYAVLLKVEQHGGEPQRKS 
150 160 170 180 190 200 210 

5 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 290 

A DNA sequence (GBSx0318) was identified in S.agalactiae <SEQ ID 933> which encodes the amino acid 
sequence <SEQ ID 934>. Analysis of this protein sequence reveals the following: 

10 Possible site: 51 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1827 (Affirmative) < suco 

15 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC44392 GB:U43526 ORF-1 [Streptococcus pneumoniae] 
20 Identities = 48/151 (31%) , Positives = 66/151 (42%) , Gaps = 5/151 (3%) 

Query: 1 MIYDHLLNLTHYKDINPNLDLAIDYLLSHDLRNLDIGTYHISPEVILMVQSNQLSFIS-FD 59 

Ml + L Y +NP+ ID+L L NL G+ I + L++ 
Sbjct: 1 MIITKISRLGTYVGVNPHFATLIDFLEKTGLENLTEGSIAIDGNRLFGNCFTYLADGQAG 60 

25 

Query: 60 HIFEYHKKYLDIHYVIEGHEVIKLGKGDKVEV-EEY — LGDIGFIKCSEETSFDLRDNYI 116 

FE H+KYLDIH V+E E + + + V V +EY DI E LR 

Sbjct: 61 AFFETHQKYLDIHLVLElffiEA^VTSPEOTSvTQEYDEEKDIELYTGKVEQLVHLRAGEC 120 

30 Query: 117 AFFFPEEAHQPNGMGSLGNYVKKGVLKVLMA 147 

FPE+ HQP + VKK V KV ++ 

Sbjct: 121 LITFPEDLHQPK- VRINDEPVKKWFKVAIS 150 

No corresponding DNA sequence was identified in S.pyogenes. 

35 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 291 

A DNA sequence (GBSx0319) was identified in S.agalactiae <SEQ ID 935> which encodes the amino acid 
sequence <SEQ ID 936>. This protein is predicted to be sugar ABC transporter, permease protein (araQ). 
40 Analysis of this protein sequence reveals the following: 

Possible site: 35 

»> Seems to have a cleavable N-term signal seq. 

45 
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Final Results 

50 bacterial membrane Certainty=0 .3951 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database: 
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>GP:AAD35515 GB:AE001721 sugar ABC transporter, permease protein 
[Thermotoga maritima] 
Identities = 94/262 (35%) , Positives = 158/262 (59%) , Gaps = 1/262 (0%) 

5 Query: 15 LILCLLTVLFIFPFYWIMTGAFKSQPDTIIIPPQWWPKAPTLENFKALTVQNPALRWLWN 74 

+ + + V+F+ P ++ + +FK + PP +PK P+LE + + + L +L N 
Sbjct: 9 IFIVFMLWFMLPVFYAWSSFKPMSEIYSYPPTIFPKXPSLEGYIOTIKEYDLLTYLRN 68 

Query: 75 SVFISIMTMFLVCCTSSMAGYVLAKKRFYGQKILFSLFIAAMALPKQVVLVPLvRIINFM 134 
10 ++F++ + + S M GY LAK +F+G + + S+F M + QV++VPL +1 + 

Sbjct: 69 TLWAWAWITVLVSWTGYGLAKGKFWGIRPVNSMFTMTMFVSAQVIMVPLFWIRSL 128 

Query: 135 GIHDTLWAVILPLVGWPFGVFLMKQFSENIPTELLESAKIDGCGEIRTFINVAFPIVKPG 194 
G+ ++LW +I+P V P G+F+ Q+ ++IP ELLESAKIDG E + F + FP+ KP 
15 Sbjct: 129 GLINSLWGLI I PAVYTPTGMFMAVQYMKDIPDELLESAKIDGANEWQI FWRIVFPLSKPL 188 

Query: 195 FAAI^IFTFINTWNDYFMQLVMLTSRNNLTISI^vATMQAEM-AT^GLI^GAALAAVP 253 

AALAIF+F WND+ + L+++ RN T+ L +AT+Q E + I+A + L +P 

Sbjct: 189 VAAIiAI FS FTVTOWNDFVLPLLVVNRRNLYTLQIjALAT I QEEYGGAEWNTI LAFSTLTI I P 248 



20 



60 



Query: 254 IVTVFLVFQKSFTQGITMGAVK 275 

+ +FL+FQ+ F +GI G +K 
Sbjct: 249 TLI I FLLFQRLFMKGIMAGGLK 270 



25 A related DNA sequence was identified in S.pyogenes <SEQ ID 937> which encodes the amino acid 
sequence <SEQ ID 938>. Analysis of this protein sequence reveals the following: 



Possible site: 40 
>>> Seems to have a cleavable N-term signal seq. 
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35 , Final Results 

bacterial membrane Certainty=0 . 3548 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

40 The protein has homology with the following sequences in the databases: 

>GP:CAB59597 GB-.AL132662 probable sugar transport inner membrane 
protein [Streptomyces coelicolor A3 (2) ] 
Identities = 88/262 (33%) , Positives = 147/262 (55%) 

45 Query: 15 VMLCTLTILFIFPFYWIMTGAFKAQADTIMIPPQWWPKAPTIENFKALVVQNPALKWLWN 74 

++L L ++F P W++ + + A+ PP WP + ++ ++ +W N 

Sbjct: 38 LLLAPIjALVFAVPLVWLVLSSVMSNAEINRFPPALWPSGIDLGGYRYVLGNAMFPRWFVN 97 

Query: 75 SVFISVATMFLVCGTSSLAGYALAKKRFYGQRLLFSIFIAAMALPKQVVLVPLVRIVNFM 134 
50 S+ +S T+ SLAGYA A+ RF G R+L + +A MA+P Q+ ++P ++ + 

Sbjct: 98 SLIVSAVTVAANLVFGSIAGYAFARMRFAGSRVIMGLMLATMAVPFQLTMIPTFLvMKKL 157 

Query: 135 GIHDTLAAVILPLVGWPFGVFLMKQFSENIPTELLESAKIDGCGEIRTFFNVAFPIVKPG 194 
G+ DTL A+I+P + PF VFL++QF ++P EL E+A IDGC +R + + P+ +P 
55 Sbjct: 158 GLIDTLGALIVPSLVTPFAVFLLRQFFLSLPRELEEAAWIDGCSRLRVLWRIVLPLSRPA 217 

Query: 195 FAAIAIFTFINTWNDYFMQLVMLTSRENLTISLGVATMQAEMATNYGLIMAGAAMAAVPI 254 

A +A+ TF+ TWND L+ + T+ LG+ T Q + T + +MAG + +P+ 

Sbjct: 218 IiATVAVLTFLTTWNDLTWPLIAINHDTQYTLQLGLTTFQGQHHTQWAAVMAGNVITVLPV 277 



Query: 255 VTVFLVFQKSFTQGITMGAVKG 276 

+ FL QK+F Q IT +KG 
Sbjct: 278 LLAFLGAQKTFIQSITSSGLKG 299 



65 An alignment of the GAS and GBS proteins is shown below: 
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Identities = 245/276 (88%) , Positives = 262/276 (94%) 

Query: 1 MKKKTFSAYNFLTAL I LCLLTVLFI FPFYWIMTGAFKSQPDTI I I PPQWWPKAPTLENFK 60 

M KK +A + LT ++LC+LT+LFIFPFYWIMTGAFK+Q DTI+IPPQWWPKAPT+ENFK 
Sbjct: 1 MTKKK1TASDILTTVMLCVLTILFIFPFYWIMTGAFICAQADTIMIPPQWWPKAPTIENFK 60 

Query: 61 ALWQNPALRWLWNSVFISIMTMFLVCCTSS^GYVLAKKRFYGQKILFSLFIAAMALPK 120 

AL VQNPAL+WLWNSVFIS+ TMFLVC TSS+AGY LAKKRFYGQ++LFS+FIAAMALPK 
Sbjct: 61 ALWQNPALKWLWNSVFISVATMFLVCGTSSLAGYALAKKRFYGQRLLFSIFIAAMALPK 120 

Query: 121 QVVLVPLWIINFMGIHDTLWAVILPLVGWPFGVFLMKQFSENIPTELLESAKIDGCGEI 180 

Q WLVPLVRI +NFMGIHDTL AVILPLVGWPFGVFLMKQFSENIPTELLESAKIDGCGEI 
Sbjct: 121 QWLVPLVRIVNFMGIHDTLAAVILPLVGWPFGVFLMKQFSENIPTELLESAKIDGCGEI 180 

15 Query: 181 RTFINVAFPIVKPGFAALAI FTFINTWNDYFMQLVMLTSRNNLTI SLGVATMQAEMATNY 240 

RTF NVAFPIVKPGFAALAIFTFINTWNDYFMQLVMLTSR NLTI SLGVATMQAEMATNY 
Sbjct: 181 RTFFOTAFPIVKPGFAAIAIFTFINTWNDYFMQLVMLTSRENLTISLGVATMQAEMATNY 240 

Query: 241 GLIMAGAALAAVPIVTVFLVFQKSFTQGITMGAVKG 276 
20 GLIMRGAA+AAVPIVTVFLVFQKSFTQGITMGAVKG 

Sbjct: 241 GLIMAGAAMAAVPIVTVFIiVFQKSFTQGITMGAVKG 276 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

25 Example 292 

A DNA sequence (GBSx0320) was identified in S.agalactiae <SEQ ID 939> which encodes the amino acid 
sequence <SEQ ID 940>. Analysis of this protein sequence reveals the following: 



40 



45 



Possible site: 31 

>» Seems to have a cleavable N-term signal seq. 



INTEGRAL 


Likelihood 


= -10. 


.83 


Transmembrane 


74 - 


90 


( 


64 


- 96) 


INTEGRAL 


Likelihood 


= -6. 


.37 


Transmembrane 


108 - 


124 


( 


107 


- 126) 


INTEGRAL 


Likelihood 


= -5. 


.84 


Transmembrane 


270 - 


286 


( 


265 


- 290) 


INTEGRAL 


Likelihood 


= -5, 


.20 


Transmembrane 


161 - 


177 


( 


156 


- 182) 


INTEGRAL 


Likelihood 


= -0. 


.16 


Transmembrane 


219 - 


235 


( 


219 


- 235) 



30 



35 

Final Results 

bacterial membrane Certainty=0 . 5331 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB05584 GB:AP001513 sugar transport system (permease) (binding 
protein dependent transporter) [Bacillus halodurans] 
Identities = 106/289 (36%) , Positives = 168/289 (57%) , Gaps = 6/289 (2%) 

Query: 9 RETMIAYAFLAPILLFFLIFVFAPMVMGFVTSFFNYSM-TQFTFIGLANYNRMF-HDSIF 66 

+E Y F+AP ++ F IF PM+ SF ++ + + + G NY R+F D +F 

Sbjct: 25 KEYFWGYLFIAPPIIGFAIFALGPMLYSIYVSFTDFDLYNEPVWTGADNYYRLFVTDDLF 84 

50 Query: 67 MKSLINTVI IVIGSVPVWFFSLFVAANTYEK1WFSRSFYRCVFFLPVVTGSVAVTVVWK 126 

K++ NT +G +P+ + SL +A +K V + +R FFLP V+ VA+T++W+ 
Sbjct: 85 RKWFNTFYAALG-IPIGMAVSLGIAvALNQK-VKGIALFRTAFFLPAVSSVVAITLLWR 142 

Query: 127 WIYDPMSGILNYILKSGHVIEQNISWLGDKHWALLAIIIILLTTSVGQPIILYIAAMGNI 186 
55 WI++ G+LN +L +V WL D+ WA+ A+II + +G +ILY+AA+ + 

Sbjct: 143 WIFNADFGLLNIMLN- -YVGIHGPGWLSDEKWAMPAMI IQGVWGGLGINMILYLAALQGV 200 

Query: 187 DNSLCFAARVDGANEMQVFWQIKWPSLLPTTLYIAVITTINSFQCFALIQLLTSGGPNYS 246 
+ +L EAA +DG N Q F I PS+ PTT +1 + +TI + Q F ++T GGPNYS 
60 Sbjct: 201 NPALYEAADIDGGNAWQKFIHITVPSISPTTFFILITSTIGALQDFQRFMIMTEGGPNYS 260 



Query: 247 TSTLMYYLYEKAFKLSEYGYANTMGVFLAVMIALISFAQFKILGNDVEY 295 
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T+T++YYL+ AF+ E GYA+ M L ++I +1+ FK+ V Y 
Sbjct: 261 TTTWYYLFIiNAFRYMEMGYASAMAWVLGI I ILI ITI INFKLAKKWVHY 309 

A related DNA sequence was identified in S. pyogenes <SEQ ID 94 1> which encodes the amino acid 
sequence <SEQ ID 942>. Analysis of this protein sequence reveals the following: 

Possible site: 13 
>» Seems to have no N-terminal signal sequence 
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Likelihood 


= -5. 


.04 


Transmembrane 


196 


- 212 


( 


190 


- 216) 


INTEGRAL 


Likelihood 


= -0. 


.16 


Transmembrane 


253 


- 269 


( 


253 


- 269) 



Final Results 

bacterial membrane 
bacterial outside 
bacterial cytoplasm 



Certainty=0. 6095 (Affirmative) < suco 
Certainty=0 . 0000 (Not Clear) < suco 
Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

>GP:BAB05584 GB:AP001513 sugar transport system (permease) (binding 
protein dependent transporter) [Bacillus halodurans] 
Identities = 113/310 (36%) , Positives = 176/310 (56%) , Gaps = 9/310 (2%) 



Query: 


25 


KVEQKKEVFQVNVNKLKMR- - -ETLISYAFLAPVLVFFVIFVLIPMIMGFVTSFFNYSM- 


80 






+VE +E K K R E Y F+AP ++ F IF L PM+ SF ++ + 




Sbjct: 


4 


EVETPRETKTTKARKQKRRLNKEYFWGYLFIAPP1IGFAIFALGPMLYSIYVSFTDFDLY 


63 


Query: 


81 


TEFTFVGFANYARMF - QDP I FMKSLINTLI I VIGSVPVWFFSLFVAAKTYDKNWARSF 


139 






E + G NY R+F D +F K++ NT +G +P+ + SL +A K V + 




Sbjct: 


64 


NEPVWTGADNYYRLFVTDDLFRKTVFOTFYAALG-IPIGMAVSLGIAVALNQK-VKGIAL 


121 


Query: 


140 


YRAVFFLPVVTGSVAVTVTOKWIYDPMSGILNYVLKYAHVIEQNISWLGDKHWALLAIIV 199 






+R FFLP V+ VA+T++W+WI++ G+LN +L Y + WL D+ WA+ A+I + 




Sbjct: 


122 


FRTAFFLPAVSS WAITLLWRWI FNADFGLLNIMLNYVGI - - HGPGWLSDEKWAMPAMI I 


179 


Query: 


200 


ILLTTSVGQPIILYIAAMGNIDNSLVEAftRVDGATEFQVFWNIKWPSLLPTTLYIAVITT 259 






+ +G +ILY+AA+ ++ +L EAA +DG +Q F +1 PS+ PTT +1 + +T 




Sbjct : 


180 


QGVWGGLGINMILYLAALQGVNPALYEAADIDGGNAWQKFIHITVPSISPTTFFILITST 


239 


Query: 


260 


INSFQCFALIQLLTSGGPNYSTSTLMYYLYEKAFKLSEYGYANTMGVFLAVMIAIISFAQ 


319 






I + Q F ++T GGPNYST+T+ + YYL+ AF+ E GYA+ M L ++I 11+ 




Sb j ct : 


240 


IGALQDFQRFMIMTEGGPNYSTTTVVYYLFLNAFRYMEMGYASAMAWVLGI I ILI ITI IN 299 


Query: 


320 


FKILGNDVEY 329 








FK+ V Y 




Sb j ct : 


300 


FKIAKKWVHY 309 





An alignment of the GAS and GBS proteins is shown below: 

Identities = 263/295 (89%), Positives =278/295 (94%) 

Query: 1 MRTNKLKMRETMIAYAFLAPILLFFLIFVFAPMVMGFVTSFFNYSMTQFTFIGLANYNRM 60 

+ NKLKMRET+I+YAFLAP+L+FF+IFV PM+MGFVTSFFNYSMT+FTF+G ANY RM 
Sbjct: 35 VNVNKLKMRETLISYAFLAPvLVFFVIFVLIPMIMGFVTSFFNYSMTEFTFVGFANYARM 94 

Query: 61 FHDS I FMKSLINTVI I VIGSVP WVFFSLFVAANTYEKNVFSRS FYRCVFFLPWTGSVA 120 

F D IFMKSLINT+IIVIGSVPVWFFSLFVAA TY+KNV +RSFYR VFFLPWTGSVA 
Sbjct: 95 FQDPIFMKSLINTLIIVIGSVPVWFFSLFVAAKTYDKNWARSFYRAVFFLPVVTGSVA 154 

Query: 121 VTVWKWIYDPMSGIIOTILKSGHVIEQNISWW3DKHWALIjAIIIILLTTSVGQPIILYI 180 

VTWWKWIYDPMSGILNY+LK HVIEQNISWLGDKHWALLAII+ILLTTSVGQPIILYI 
Sbjct: 155 VTVWKWIYDPMSGILNYVLKYAHVIEQNISWLGDKHWALLAIIVILLTTSVGQPIILYI 214 

Query: 181 AAMGNIDNSLCEAARVDGANEMQVFWQIKWPSLLPTTLYIAVITTINSFQCFALIQLLTS 240 
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AAMGNIDNSL EAARVDGA E QVFW IKWPSLLPTTLYIAVITTINSFQCFALIQLLTS 
Sbjct: 215 AAMGNIDNSLVEAARVDGATEFQVFWNIKWPSLLPTTLYIAVITTINSFQCFALIQLLTS 274 

Query: 241 GGPlSrySTSTLMYYLYEKAFKLSEYGYANTMGVFLAVMIALISFAQFKILGNDVEY 295 
5 GGPISrySTSTLWY!fLYEKAFKLSEYGYANTMGVETAVMIA+ISFAQFKILGNDVEY 

Sbjct: 275 GGPNYSTSTLMYYLYEKAFKLSEYGYANTMGVFLAVMIAIISFAQFKILGNDVEY 329 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

10 Example 293 

A DNA sequence (GBSx0321) was identified in S.agalactiae <SEQ ID 943> which encodes the amino acid 
sequence <SEQ ID 944>. Analysis of this protein sequence reveals the following: 



15 



20 



Possible site: 31 

»> Seems to have a cleavable N-term signal seg. 



Final Results 

bacterial outside Certainty=0. 3 000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB12516 GB:Z99107 similar to sugar-binding protein [Bacillus subtilis] 
Identities = 54/187 (28%) , Positives = 90/187 (47%) , Gaps = 14/187 (7%) 

25 Query: 19 MFACVDSSQSVMAAEKD-KVEITWWAFPTFTQEKAKDGVGTYEKKVIKAFEKKNPNIKW 77 
MF+ + + ++D + I WW + D Y KVI+ +EKKNP++ ++ 
Sbjct: 1 MFSGCSAGEEASGKKEDVTLRIAWWG GQPRHD YTTKVIELYEKKNPHVHIE 51 

Query: 78 LETIDFTSGPEKI TTAIEAGTAPD VLFDAPGRI IQYGKNGKLADLNDLFTDQF I KDVN- - 135 
30 E ++ +K+ AG PDV+ + QYGK +L DL D I DV+ 

Sbjct: 52 AEFANWDDYWKKIAPMSAAGQLPDVIQMDTAYLAQYGKKNQLEDLTPYTKDGTI-DVSSI 110 

Query: 136 NKNIIQASKSGDKAYMYPISSAPFY^FNKKMLKDAGVLKLVTCEGWTTSDFEKVLKALKN 195 
++N++ K +K Y + + + N+ +LK AGV + +E WT D+EK+ L+ 

35 Sbjct: 111 DENMLSGGKIDNKlYGFTLGvNVLSVIANEDLLKKAGV-SINQENWTWEDYEKLAYDLQE 169 

Query: 196 KGYTPGS 202 

K GS 
Sbjct: 170 KAGVYGS 176 

40 

A related DNA sequence was identified in S.pyogenes <SEQ ID 945> which encodes the amino acid 
sequence <SEQ ID 946>. Analysis of this protein sequence reveals the following: 

Possible site: 20 

45 >>> May be a lipoprotein 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

50 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

!GB:Z99107 similar to sugar-binding protein [Bacillu. . . 82 2e-14 

55 >GP:CAB12516 GB:Z99107 similar to sugar-binding protein [Bacillus subtilis] 

Identities = 105/446 (23%) , Positives = 176/446 (38%) , Gaps = 71/446 (15%) 



Query: 24 GKSQKEAGASKSDTAKTEITWWAFPVFTQEKAEDGVGTYEKKLIAAFEKANPEIKVKLET 83 
GSE+K+ IWW +D Y K+I +EK NP + ++ E 
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Sbjct: 4 GCSAGEEASGKKEDVTLRIAWWG GQPRHD YTTKVIELYEKKNPHVHIEAEF 54 

Query: 84 IDFTSGPEKITTAIEAGTAPDVLFDAPGRIIQYGKNGKLADLNDLFTEEFTKDVN- -NDK 141 

++ +K+ AG PDV+ + QYGK +L DL +T++ T DV+ ++ 

Sbjct: 55 ANWDDYWKKliAPMSAAGQLPDVI QMDTAYLAQYGKKNQLEDLTP - YTKDGT ID VS S I DEN 113 

Query: 142 LIQASKAGDTAYIWPISSAPFYMALNKKMLKDAGVLDLVKEGWTTDDFEKVLKALKDK-- 199 

++ K + Y + + + N+ +LK AGV + +E WT +D+EK+ L++K 

Sbjct: 114 MLSGGKIDNKLYGFTLGVNVLSVIANEDLLK3CAGV-SINQENWTWEDYEKLAYDLQEKAG 172 

Query: 200 GYNPGSFFANGQGGDQGPRAFFANLYSSHTTDDKV TKYTT 239 

G +P F +G R + + DD++ T T 

Sbjct: 173 VYGSNGMHPPDIFFPYYLRTKGERFYKEDGTGLAYQDDQLFVDYFERQLRLVKAKTSPTP 232 

15 Query: 240 DDANSIKAMTKISNWIKDGLMMNGSQYDGSADIQNFANGQTSFTILWAPAQPGIQAKLLE 299 
D++ IK M +D ++ G SA N++N F A+L + 

Sbjct: 233 DESAQIKGM EDDFIVKGK SAITWNYSNQYLGF ARLTD 269 

Query: 300 ASKVDYLEIPFPSDDGKPELEYLVNGFAVFNNKDEQKVAASKTFIQFIADDKEWGPKNW 359 
20 +YLP+L+ EK A+K FI F +++E + + 

Sbjct: 270 SPLSLYLP---PEQMQEKALTLKPSMLFSIPKSSEHKKEAAK-FINFFVNNEE-ANQLIK 324 

Query: 360 RTGAFPVRTSYGDLYKDKRMEK- - - IAEWTKFYSPYYNTID GFAEMRTLWFPMVQ 411 

PV DKKE+ IE++S + D G AE+ L 

25 Sbjct: 325 GERGVPVSDKVADAIKPKLNEEETNIVEYVETASKNISKADPPEPVGSAEVIKLLKDTSD 384 

Query: 412 AVSNGDEKPEDALKAFTEKANKTIKK 437 

+ PE A K F +KAN+ +++ 

Sbjct: 385 QILYQKVSPEKAAKTFRKKANEILER 410 

30 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 352/438 (80%) , Positives = 384/438 (87%) , Gaps = 4/438 (0%) 

Query: 1 MSI KKS VIGFCLGAAALSMFACVDSSQSVMAREKD KVEITWWAFPTFTQEKAKDGVG 57 

35 M++KK LGA+ L + AC SQ A K K EITWWAFP FTQEKA+DGVG 

Sbjct: 1 MSMKKLASLAMLGASVLGLAACGGKSQKEAGASJCSDTAKTEITVMAFPVFTQEKAEDGVG 60 

Query: 58 TYEKKVIKAFEKKNPNIKVKLETIDFTSGPEKITTAIEAGTAPDVLFDAPGRIIQYGKNG 117 
TYEKK+I AFEK NP IKVKLETIDFTSGPEKITTAIEAGTAPDVLFDAPGRIIQYGKNG 
40 Sbjct: 61 TYEKKLIAAFEKANPEIKVKLETIDFTSGPEKITTAIEAGTAPDVLFDAPGRIIQYGKNG 120 

Query: 118 KLADLM3LFTDQFIKDVNNKNIIQASKSGDKAYMYPISSAPFYMA 177 

KLADI1NDLFT++F KDVNN +IQASK+GD AYMYPISSAPFYMA NKKMLKDAGVL LV 
Sbjct: 121 KLADI^TOLFTEEFTKDVN^KLIQASKAGDTAWPISSAPFYMAIMKKMLKDAG 180 

45 

Query: 178 KEGWTTSDFEKVLKALKNKGYTPGSFFANGC<3GDQGPRAFFANLYSAPITDKEVTKYTTD 237 

KEGWTT DFEKVLKALK+KGY PGSFFANGQGGDQGPRAFFANLYS+ ITD +VTKYTTD 
Sbjct: 181 KEGWTTDDFEKVLKALKDKGYNPGSFFANGQGGDQGPRAFFANLYSSHITDDKVTKYTTD 240 

50 Query: 238 TKNSVKSMKKIVEWIKKGYLMNGSQYDGSADIQNFANGQTAFTILWAPAQPKTQAKLLES 297 

NS+K+M KI WIK G +MNGSQYDGSADIQNFANGQT+FTILWAPAQP QAKLLE+ 
Sbjct: 241 DANS I KAMTKI SNWI KDGLMMNGSQYDGSADIQNFANGQTSFTI LWAPAQPGI QAKLLEA 300 

Query: 298 SKVDYLEVPFPSEDGKPDLEYLWGFAVFNI#CDENKVKASKKFITFIADDKKWGPKDVIR 357 
55 SKVDYLE+PFPS+DGKP+LEYLWGFAVFNNKDE KV ASK FI FIADDK+WGPK+V+R 

Sbjct: 301 SKTOYLEIPFPSDDGKPELEYLWGFAVFNNKDEQKOTUISKTFIQFIADDKEWGPKNVVR 360 

Query: 358 TGAFPVRTSFGDLYKGDKRMMKISKWTQYYSPYYNTIDGFSEMRTLWFPMVQSVSNGDEK 417 
TGAFPVRTS+GDLYK DKRM KI++WT++YSPYYNTIDGF+EMRTLWFPMVQ+VSNGDEK 
60 Sbjct: 361 TGAFPvRTSYGDLYK-DKRMEKIAEWTKFYSPYYNTIDGFAEMRTLWFPMVQAVSNGDEK 419 

Query: 418 PADALKDFTQKANDTIKK 435 

P DALK FT+KAN TIKK 
Sbjct: 420 PEDALKRFTEKANKTIKK 437 



65 
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A related GBS gene <SEQ ID 8537> and protein <SEQ ID 8538> were also identified. Analysis of 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 4 
McG: Discrim Score: 5.05 
GvH: Signal Score (-7.5): 4.69 

Possible site: 31 
>>> Seems to have a cleavable N-term signal seq. 
ALOM program count: 0 value: 7.69 threshold: 0.0 
PERIPHERAL Likelihood =7.69 90 
modified ALOM score: -2.04 



*** Reasoning Step: 3 

Final Results 

bacterial outside 

bacterial membrane 

bacterial cytoplasm 



Certainty=0. 3000 (Affirmative) < suco 
Certainty=0. 0000 (Not Clear) < suco 
Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

28.8/48.4% over 409aa 

Bacillus subtilis 

EGAD | 107689 | hypothetical protein Insert characterized 

GP|2633010|emb|CAB12516.l| |Z99107 similar to sugar-binding protein Insert characterized 
PIR| F69796 | F69796 sugar-binding protein homolog yesO - Insert characterized 

ORF01146(355 - 1605 of 1914) 

EGAD|107689|BS0697(1 - 410 of 412) hypothetical protein {Bacillus 
subtilis}GP|2633010|emb[CAB12516.l| |Z99107 similar to sugar-bindin 

g protein {Bacillus subtilis } PIR | F69796 |F69796 sugar-binding protein homolog yesO - 
Bacillus subtilis 
%Match =5.4 

%Identity =28.8 %Similarity =48.3 

Matches = 69 Mismatches = 116 Conservative Sub.s = 47 



318 348 378 435 465 495 525 

RGIVMSIKKSVIGFCLGAAALSMFACVDSSQSVMAAEKD-KVEITWWAFPTFTQEKAKDGVGTYEKKVIKAFEKKNPNIK 
11= ==l = I II I |||: :|||||=: 

MFSGCSAGEEASGKKEDVTLRIAWW GGQPRHDYTTKVIELYEKKNPHVH 

10 20 30- 40 

555 585 615 645 675 705 732 762 

VKLETIDFTSGPEKITTAIEAGTAPDVLFDAPGRIIQYGKNGKLADLNDLFTDQFIKDVN-NKNIIQASKSGDKAYMYPI 
= = I == =1= II 111= = Mil =1 II 11= ::):= I =1 I = : 

IEAEFANWDDYWKKLAPMSAAGQLPDVIQMDTAYLAQYGKKNQLEDLTPYTKDGTIDVSSIDENMLSGGKIDNKLYGFTL 
60 70 80 90 100 110 120 

792 822 852 882 912 942 972 

SSAPFYMAFNKKMLKDAGVLKLVKEGVrrTSDFEKVLKALKNKGYTPGSFFANGQGGDQGPRAFFANLYSA 



GVNVLSVIANEDLLKKAGV- S INQENWTWEDYEKLAYDLQEK AGVYGSNGM- - - HPPDI FFPYYLRTKGERFYKEDG 

140 150 160 170 180 190 200 

990 1020 1050 1080 

PITDKEVTKYTTDTKNSVKSMKKIVEWIKKGYLMNGSQYDGSA 

1= II I II = = I 

TGLAYQDDQL NIVEYVETASKNISKADPPEPVGSAEVTKLLKDTSDQILYQKV 

350 360 370 380 390 



1515 1545 1575 1605 1635 1665 1695 1725 

FSEMRTLWFPMVQSVSNGDEKPADALKDFTQKANDTIKKAAK*LRRLLFYGQSHIGIEEEFLVKLRCKGEYRMRTNKLK 

I I I I =111= === 
SPEKAAKTFRKKANEILERNN 
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SEQ ID 944 (GBS16) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 3 (lane 9; MW 49kDa). 

The GBS16-His fusion product was purified (Figure 92A; see also Figure 189, lane 9) and used to immunise 
mice (lane 1+2 product; 20ug/mouse). The resulting antiserum was used for Western blot (Figure 92B), 
5 FACS (Figure 92C ), and in the in vivo passive protection assay (Table III). These tests confirm that the 
protein is immunoaccessible on GBS bacteria and that it is an effective protective immunogen. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 294 

10 A DNA sequence (GBSx0322) was identified in S.agalactiae <SEQ ID 947> which encodes the amino acid 
sequence <SEQ ID 948>. Analysis of this protein sequence reveals the following: 

Possible site: 49 

»> Seems to have no N-terminal signal sequence 

15 Final Results 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certain ty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

20 A related GBS nucleic acid sequence <SEQ ID 9459> which encodes amino acid sequence <SEQ ID 9460> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC66999 GB:AE001166 conserved hypothetical protein [Borrelia 
burgdorferi] 

25 Identities = 107/225 (47%) , Positives = 147/225 (64%) , Gaps = 6/225 (2%) 



30 



Query: 12 QIKNGIIVSCQALPGEPLYTESGGVMPLLALAAQEAGAVGIRANSTODIKEIQEVTNLPI 71 

+IK G+IVSCQAL EPL+ S +M +ALAA+ GA+GIRAN V DI +1+ +LPI 
Sbjct: 6 KIKRGLIVSCQALENEPLH--SSFIMSKMAIAAKIGGAIGIRANGVNDISQIKLEVDLPI 63 

Query: 72 IGIIKREYPPQEPFITATMTEVDQIASLDIAVIALDCTLRERHDGLSWEFIQKIKRKYP 131 

IGIIK+ Y + FIT TM E+D+L + + +IALD T R R DG+ + +F + IK+KYP 
Sbjct: 64 IGIIKKNYNNCDVFITPTMKEIDELCNEGVDIIALDATFRNRPDGVLLDDFFENIKKKYP 123 

35 Query: 132 EQLLMADISTFEEGKNAFEAGVDFVGTTLSGYTDYSR- -QEEGPDIELLNKLCQAGI - -D 187 

+Q LMADIS+ +E NA + G DF+GTTL GYT + D L L + + 

Sbjct: 124 KQCLMADISSLDEAINADKLGFDFIGTTLYGYTKNTNGLNIADNDFNFLRTLLNSNLKST 183 

Query: 188 VIAEGKIHTPKQANEINHIGVAGIWGGAITRPKEIAERFISGLS 232 
40 +1 EGKI TP +A + +GV +WGGAITRP EI ++F+ ++ 

Sbjct: 184 LIVEGKIDTPLKAQKCFEMGVDLVWGGAITRPAEITKKFVEKIN 228 

A related DNA sequence was identified in S.pyogenes <SEQ ID 949> which encodes the amino acid 
sequence <SEQ ID 950>. Analysis of this protein sequence reveals the following: 

45 Possible site: 44 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.49 Transmembrane 175 - 191 ( 175 - 192) 

Final Results 

50 bacterial membrane Certainty=0 . 1595 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the databases: 

>GP:AAD28762 GB:AF130859 putative N-acetylmannosamine-6-P epimerase 
[Clostridium perfringens] 
Identities = 113/225 (50%) , Positives = 148/225 (65%) , Gaps = 5/225 (2%) 

5 

Query: 10 LMEQLKGGIIVSCQALPGEPLYSETGGIMPLMAKAAQEAGAVGIRANSVRDIKEIQAITD 69 

+++ +KG +IVSCQAL EPL+S IM MA AA++ GA IRA + DI EI+ +T 
Sbjct: 1 MLDWKGNLIVSCQALSDEPLHSSF--IMGRMAIAAKQGGAAAIRAQGIDDINEIKEVTK 58 

10 Query: 70 LPIIGIIKOTYPPQEPFITATMTEVDQLAAENIAVIAMDCTKRDRHDGLDIASFIRQVKE 129 

LPIIGIIK++Y E +IT TM EVD+L + +1 +D TKR R +G +1 + + 
Sbjct: 59 LPIIGIIKRNYDDSEIYITPTMKEVDELLKTDCEMIGLDATKRKRPNGENIKDLVDAIHA 118 

Query: 130 KYPNQLLMADISTFDEGLVAHQAGIDFVGTTLSGYTPYSRQEAGPDVALIEALCK-AGIA 188 
15 K +L MADIST +EG+ A + G D V TTLSGYTPYS+Q D L+E L K I 

Sbjct: 119 K--GRIAMADISTLEEGIEAEKLGFDCTSTTLSGYTPYSKQSNSVDFELLEELVKTVKIP 176 

Query: 189 VIAEGKIHSPEEAKKINDLGVAGIWGGAITRPKEIAERFIEALK 233 
VI EG+I++PEE KK DLG WGGAITRP++I +RF + LK 
20 Sbjct: 177 VICEGRINTPEELKKALDLGAYSAWGGAITRPQQITKRFTDILK 221 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 172/227 (75%) , Positives = 202/227 (88%) 

25 Query: 5 SKEAFKKQIKNGIIVSCQALPGEPLYTESGGVMPLLALAAQEAGAVGIRANSVRDIKEIQ 64 

+KE +Q+K GIIVSCQALPGEPLY+E+GG+MPL+A AAQEAGAVGIRANSVRDIKEIQ 
Sbjct: 6 TKEKLMEQLKGGIIVSCQALPGEPLYSETGGIMPLMAKAAQEAGAVGIRANSVRDIKEIQ 65 

Query: 65 EVTNLPIIGIIKREYPPQEPFITATMTEVDQIASLDIAVIALDCTLRERHDGLSVVEFIQ 124 
30 +T+LPIIGIIK++YPPQEPFITATMTEVDQLA+L+IAVIA+DCT R+RHDGL + FI+ 

Sbjct: 65 AITDLPIIGIIKKDYPPQEPFITAlMTEVDQLflALNIAVIftMXn'KRDRHDGLDIASFIR 125 

Query: 125 KI KRKYPEQLLMAD I STFEEGKNRFEAGVDFVGTTLSGYTDYSRQEEGPDIEIJjNKLCQA 184 
++K KYP QLLMADISTF+EG A +AG+DFVGTTLSGYT YSRQE GPD+ L+ LC+A 
35 Sbjct: 126 QVICEKYPNQLLMADISTFDEGLVAHQAGIDFVGTTLSGYTPYSRQEAGPDVALIEALCKA 185 

Query: 185 GIDVIAEGKIHTPKQANEINHIGVAGIWGGAITRPKEIAERFISGL 231 

GI VIAEGKIH+P++A +IN +GVAGIWGGAITRPKEIAERFI L 
Sbjct: 186 GIAVIAEGKIHSPEEAKKINDLGVAGIWGGAITRPKEIAERFIEAL 232 

40 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 295 

A DNA sequence (GBSx0323) was identified in S.agalactiae <SEQ ID 951 > which encodes the amino acid 
45 sequence <SEQ ID 952>. This protein is predicted to be group B streptococcal surface immunogenic 
protein. Analysis of this protein sequence reveals the following: 

Possible site: 25 

>>> Seems to have a cleavable N-term signal seq. 

50 Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

55 A related DNA sequence was identified in S.pyogenes <SEQ ID 953> which encodes the amino acid 
sequence <SEQ ID 954>. Analysis of this protein sequence reveals the following: 

Possible site: 25 
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>>> Seems to have a cleavable N-term signal seq. 



Final Results 

bacterial outside Certainty=0 . 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 182/437 (41%) , Positives = 240/437 (54%) , Gaps = 53/437 (12%) 

Query: 1 MKMNKKVLLTSTMAASLLSVASVQAQETDTTWTARTVSEVKADLVKQDNKSSYTVKYGDT 60 

M + KK L +++A SL+ +A+ QAQE WT R+V+E+K++LV DN +YTVKYGDT 
Sbjct: 1 MIITKKSLFVTSVALSLVPLATAQAQE WTPRSVTEIKSELVLVDNVFTYTVKYGDT 56 

Query: 61 LSVISEAMSIDMNVIAKINNIADINLIYPETTLTVTYDQKSHTATSMKIETPATNAAGQT 120 

LS I+EAM ID++VL IN+IA+I+LI+P+T LT Y+Q AT++ ++ PA++ A + 
Sbjct: 57 LSTIAEAMGIDVHVLGDINHIANIDLIFPDTILTANYNQHGQ-ATNLTVQAPASSPASVS 115 

Query: 121 TATVDLKTNQVS VADQKVSLNTI SEGMTP - EAATTI VSPMKTYSSAPALKSKE VIAQEQA 179 

Q S Q ++ TP + TT + K SS A S E+ + 

Sbjct: 116 HVPSSEPLPQASATSQPTV- - PMAPPATPSDVPTTPFASAKPDSS VTA- -SSELTSSTND 171 

Query: 180 VSQAAANEQVSPAPWSITSEVPAAKEEVKPTQTSVSQSTTVSPASVAAETPAPVAKVAP 239 

VS ++E V PAE TVT+SA+APP + 

Sbjct: 172 VSTELSSESQKQPEVPQEAVPTPKAAE TTEVEPKTDISEAPTSANRPVPNESASE 226 

Query: 240 VRTVAAPRVASVKWTPKVETGASPEHVSAPAVP VTTTSPATDSKLQATEVKSVPVA 296 

+ AAP + A E SAPA TTS AT + L 
Sbjct: 227 EVSSAAP AQAPAEKEETSAPAAQKAVADTTSVATSNGL 264 

Query: 297 QKAPTATPVAQPASTTNAVAAHPENMLQPHVAAYKEKVASTYGVNEFSTYRAGDPGDHG 356 

AP A +P NAGLQP AA+KE+VAS +G+ FS YR GDPGDHG 

Sbjct: 265 SYAPNH AYNPMNAGLQPQTAAFKEEVASAFGITSFSGYRPGDPGDHG 311 

Query: 357 KGIATOFIVGTNQALGNKVAQYSTQNMAANNISYVIWQQKFYS 416 

KGLA+DF+V N ALG++VAQY+ +MA ISYVIW+Q+FY+ SIYGPA TWN MPD 
Sbjct: 312 KGLAIDFMVPENSALGDQVAQYAIDHMAERGISYVIWKQRFYAPFASIYGPAYTWNPMPD 371 

Query: 417 RGGVTANHYDHVHVSFN 433 

RG +T NHYDHVHVSFN 
Sbjct: 372 RGSITENHYDHVHVSFN 388 

A related GBS gene <SEQ ID 8539> and protein <SEQ ID 8540> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 3 
SRCFLG: 0 

McG: Length of UR: 20 

Peak Value of UR: 1.96 

Net Charge of CR: 2 
McG: Discrim Score: 2.95 
GvH: Signal Score (-7.5): 3.84 

Possible site: 23 
>>> Seems to have a cleavable N-term signal seq. 
Amino Acid Composition: calculated from 24 
ALOM program count: 0 value: 4.29 threshold: 0.0 
PERIPHERAL Likelihood =4.29 58 
modified ALOM score: -1.36 

*** Reasoning Step: 3 

Rule gpol 



Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 
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bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

SEQ ID 8540 (GBS322) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 77 (lane 9; MW 521cDa). The GBS322-His fusion product was purified (Figure 
5 214, lane 10) and used to immunise mice. The resulting antiserum was used for FACS (Figure 267), which 
confirmed that the protein is immunoaccessihle on GBS bacteria. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 296 

10 A DNA sequence (GBSx0324) was identified in S.agalactiae <SEQ ID 955> which encodes the amino acid 
sequence <SEQ ID 956>. Analysis of this protein sequence reveals the following: 

Possible site: 23 

>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -1.86 Transmembrane 5 - 21 ( 4-21) 

15 



20 



25 



Final Results 

bacterial membrane --- Certainty=0 . 1744 (Affirmative) <: suco 

bacterial outside --- Certainty=0 ..0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC46072 GB:U50357 zoocin A endopeptidase [Streptococcus 
zooepidemicus] 

Identities = 163/274 (59%) , Positives = 196/274 (71%) , Gaps = 11/274 (4%) 

Query: 25 vLADTyVRPIDNGRITTGFNGYPGHCGVrjYAVPrGTIIRAVADGTVKFAGAGANFSWMTD 84 

V A TY RP+D G ITTGFNGYPGH GVDYAVP GT +RAVA+GTVKFAG GAN WM 
Sbjct: 21 VSAATYTRPLDTGNITTGFNGYPGHVGVDYAVPVGTPVl^VANGTVKFAGNGANHPWMLW 80 

30 Query: 85 LAGNCVMIQHADGMHSGYAHMSRWARTGEKVKQGDIIGYVGATGMATGPHLHFEFLPAN 144 

+AGNCV+IQHADGMH+GYAH+S++ T VKQG IIGY GATG TGPHLHFE LPAN 
Sbjct: 81 MAGNCVLIQHADGMHTGYAHLSKISVSTDSTVKQGQIIGYTGATGQVTGPHLHFEMLPAN 140 

Query: 145 PNFQNGFHGRINPTSLIANVATFSGKTQASAPSIKPLQSAPVQNQSSKLKVYRVDELQKV 204 
35 " PN+QNGF GRI+PT IAN F+G T + P N LK+Y+VD+LQK+ 

Sbjct: 141 PNWQNGFSGRIDPTGYIANAPVFNGTTPTE PTTPTTN LKIYKVDDLQKI 189 

Query: 205 NGWLvTCNNTLTPTGFDWNDNGIPASEIDEvBANGNLTADQVLQKGGYFIFNPKTLKTVE 264 
NG+W V+NN L PT F W DNGI A ++ EV +NG T+DQVLQKGGYF+ NP +K+V 
40 Sbjct: 190 NGIWQVR^ILVPTDFTWVDNGIAADDVIEVTSNGTRTSDQVLQKGGYFVINPNNVKSVG 249 

Query: 265 KPIQGTAGLTWAKTRFANGSSVWLRVDNSQELLY 298 

P++G+ GL+WA+ F G +VWL + LLY 
Sbjct: 250 TPMKGSGGLSWAQVNFTTGGNVWLNTTSKDNLLY 283 

45 

No corresponding DNA sequence was identified in S.pyogenes. 

A related GBS gene <SEQ ID 8541> and protein <SEQ ID 8542> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 6 
50 McG: Discrim Score: 6.63 

GvH: Signal Score (-7.5): -2.97 

Possible site: 23 
»> Seems to have an uncleavable N-term signal seq 
ALOM program count: 1 value: -1.86 threshold: 0.0 
55 INTEGRAL Likelihood = -1.86 Transmembrane 5 - 21 ( 4-21) 

PERIPHERAL Likelihood =5.57 50 
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modified ALOM score: 0.87 

*** Reasoning Step: 3 

5 Final Results 

bacterial membrane Certainty=0. 1744 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

10 The protein has homology with the following sequences in the databases: 

GP|280435l|gb|AAC46072.l| |U50357(21 - 283 of 285) zoocin A endopeptidase {Streptococcus 
zooepidemicus } 
%Match =34.2 

%Identity = 61.3 %Similarity =74.4 
15 Matches = 163 Mismatches =65 Conservative Sub.s =35 

144 174 204 234 264 294 324 354 

VV*VFLS*LRYTTYILKTFLFIKPPKYSSR*VLFLIF*FKFSNKLIASV*ALHYINSIWRFFLNKWLVKASSLVVLGGMV 

20 MKRIFFAFLSLCLF 

10 

384 414 444 474 504 534 564 594 

LSAGSRvIADTYWPIDNGRITTGFNGYPGHCGVDYAVPTGTIIRAVADGTVXFAGAGANFSWMTDIAGNCVMIQHADGM 

25 : I I || ||:| I lllllllllll lllllll II :||||:||||||| III II : I I I I I : I I I I I I I 

1FGTQTVSA&TYTRPLDTGNITTGFNGYPGHVGVDYAVPVGTPVR 

30 40 50 60 70 80 90 

624 654 684 714 744 774 804 834 

30 HSGYAHMSRVVARTGEKVKQGDIIGYVGATGMATGPHL^^ 



HTGYAHLSKISVSTDSTVKQGQIIGYTGATGQVTGPHLHFEMLPANPNWQNGFSGRIDPTGYIANAPVFNGTT 

110 120 130 140 150 160 

35 864 894 924 954 984 1014 1044 1074 

KPLQSAPVQNQSSKLKVYRVDELQKVNGVWLVTOSIOTLT 

I : I ' == I |:|: I hi II: I hi hi I I II I I Mil I =.= II =11 h I I I I I I I I I 1= I I 
-P--TEP-TTPTTNLKIYKVDDLQKINGIWQVRNNILVPTDFTWVDNGIAADDVIEOTSNGTRTSDQVLQKGGYEV 
180 190 200 210 220 230 240 

40 

1104 1134 1164 1194 1224 1254 1284 1314 

TLKTVEKPIQGTAGLTWAKTRFANGSSWLRVDNSQELLYK*FEVM 

:| = l h = h Ihlh I I =111 = III 
NVKSVGTPMKGSGGLSWAQVNFTTGGNVWLNTTSKDNIjLYGK 

45 260 270 280 

SEQ ID 8542 (GBS36) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 1 1 (lane 4; MW 34.1kDa). 

GBS36-His was purified as shown in Figure 192, lane 7. 

50 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 297 

A DNA sequence (GBSx0325) was identified in S.agalactiae <SEQ ID 957> which encodes the amino acid 
sequence <SEQ ID 95 8>. This protein is predicted to be phosphoribosylaminoimidazolecarboxamide 
55 formyltransferase/IMP cyclohyd. Analysis of this protein sequence reveals the following: 

Possible site: 48 

>>> Seems to have no N-terminal signal sequence 
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Final Results 

bacterial cytoplasm Certainty=0. 2815 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB04352 GB;AP001509 pbosphoribosylaminoimidazolecarboxamide 

f ormyltransferase/IMP cyclohydrolase [Bacillus halodurans] 
Identities = 310/515 (60%) , Positives = 390/515 (75%) , Gaps = 4/515 (0%) 



Query: 


1 


MTKRMjISVSDKSGIIDFAKELKNLGWDIISTGGTKVALDDAGVETIAIDDVTGFPEMMD 


60 






M +RAL+SVS+K GI+ FAK L +I+STGGTK AL +AG+ I DVTGFPE++D 




Sb j Ct : 


1 






Query: 


61 


GRVKTLHPNIHGGLIARRDADSH^ 


120 






GRVKTLHPNIHGGLLA R+ D HL +++I ID WVNLYPF++TI +P+ T+ A+ 




Sb j ct : 


bx 






Query: 


121 


ENIDIGGPSMLRSAAKJJHASVTWVDSAOT 


180 






ENIDIGGPSMLR+AAKNH VTWVD DY TVL ELAD +T++RLAAK FRHTA 




Sb j ct : 


121 


'C?,T"rT\T/~V"lT}C'MT DA A A T/TiTUTVLTX PttT H 7T*1 T3T TTIVCTT 7T VT^T A T\fV~ , 'KlXT7\ T , T7 ( T , T< r OD T A 7\ 7T?TD TJTTV 

ENIulGGPSMIjKAAAKWHytt^ 


i on 


Query: 


181 


AYDM.IAEYFTAQVGEAKPSKLTITYDLKQJ^YGENPQQDADFYQKALPTDYSIASAKQ 


240 






AYDA+IAEY T VGE PE LT+T++ KQ +RYGENP Q A FYQK L SIA AKQ 




Sb j Ct : 


181 


AYUAMIAEYLTDAVGEESPESLT VTF&r^QDIJiKx^SEiNPHy^I i? xy^PlA^AKAb i-AHAKQ 




Query: 


1/1 1 










L+GKELS+NNI DADAA+ I+++FK+ P VA+KHMNPCG+G + 1+ A+D AYEADPV 




Sb j ct : 


241 


LHGKELSYNNINDMAALSIVKEPKE-PAAVAVKHMNPCGV^ 


299 


Query: 


301 


SIFGGIVVXNREVDAATAEKMHPIFLEIIIAPSYSEEAIiAIIiTNKKKNLRILELPFDAQA 


360 






SIFGGI+ LNREVD TA+ + IFLEIIIAPS+SEEAL +LT+ KKNLR+L LP + + 




Sbjct: 


300 


SIFGGIIAI^EVDVBTAKTLKEIFLEIIIAPSFSEEa^^ 


357 


Query: 


361 


ASETOAEYTGWGGLLVQNQDWAENPSDWQWTDRQPTEQEATALEFAWKAIKyVKSNG 


420 






++ E T + GG LVQ +D ++ ++ T R+PTE E AL+ AW+ ,+K+VKSN 




Sbjct: 


358 


-NQAEKRITS IHGGALVQEEOTYGFEEAE I KI PTKREPTEAEWEALKLAWRVVKHVKSNA 


416 


Query: 


421 


IIimiHMTLGLGAGQTNRVGSVKIAIEQAKDHLDGAVLASDAFFPFADNIEElAAAGIK 


480 






I++ 4 MT+G+GAGQ NRVG+ KIAIEQA + G+V+ SDAFFP D +E A AGI 




Sbjct: 


417 


IVTiADGQMWGVGAGQMNRVGAAKIAIEQAGEKAAGSVMGSDAFFPMGDTVELAAKAGIT 


476 


Query: 


481 


AIIQPGGSVRDQESIDAANKHGLTMIFTGVRHFRH 515 








AIIQPGGS+RD+ESI+ A+KHG+ M+FTGVRHF+H 




Sb j ct : 


477 


AI IQPGGS IRDEESIENADKHGIAMVFTGVRHFKH 511 





A related DNA sequence was identified in S.pyogenes <SEQ ID 959> which encodes the amino acid 
sequence <SEQ ID 960>. Analysis of this protein sequence reveals the following: 

Possible site: 48 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty^O. 2932 (Affirmative) < suco 
bacterial membrane — Certainty= 0.0000 (Not Clear) < suco 
bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 500/515 (97%) , Positives = 507/515 (98%) 

Query: 1 MTKRALISVSDKSGlIDFARELKNLGTOIISTGGTKVAIZlDAGvETIAIDDVTGFPEMMD 60 

MTKRALISVSDKSGI+DFAKEBKNLGWDIISTGGTKV LDDAGVETIAIDDVT FPEMMD 
Sbjct: 1 MTKRALISVSDKSGIVDFAKEBKNLGWDIISTGGTKVITjDDAGvETIAIDDvTRFPEM^ 60 

Query: 61 GRVKTLHPNIHGCSLLARRDADSHLQAAKBNNIELIDLvvvOT.YPFKETILRPDVTYDIAV 120 



WO 02/34771 



PCT/GB01/04789 



-381- 



GRVKTLHPNIHGGLLARRDADSHLQAAKD™iELIDLVVVMLYPFKETILRPD+TYDIia.V 



Sb j ct : 


61 


GRVKTLHPNIHGGLLARRDADSHLQAAKDNNIELIDLVWNLYPFKETILRPDITYDLAV 


120 


Query: 


121 


bJNlUlta^i^bMbKoAAJlNrlAoVI V V VDbADxAlVIj(aE.bADAby 1 1 r K 1 KyKLAAKAr Kri J. A 


180 






ENI D I GGP SMLRS AAKNHAS VTWVD ADYATVLGELADA QTTF+TRQRLAAK FRHTA 




Sbjct: 


121 


ENIDIGGPSMLRSAAKNHASVTVVVDPADYATVLGELADA^ 


180 


Query: 


181 


AYDAJj I AE Y FTAQVQiEAKPEKLT I T YDL KQAMRYGEN P Q QDADFYQKALPTDYSIASAKQ 


240 






AYDALI AEYFT QVGEAKPEKLTITYDLKQAMRYGENPQQDADFYQKALPTDYS IASAKQ 




Sb j ct : 


181 


AYDALIAEYFTTQVGEAKPEKLTITYDLKQAMRYGENPQQDADFYQKALPTDYSIASAKQ 


240 


Query: 


241 


LNGKELSFNNIRDADAAIRIIRDFKDSPTWALKHP^P 


300 






LNGKELSFNNIRDADAAIRIIRDFKD PTWALKHMNPCGIGQADDIETAWDY Y+ADPV 




Sb j ct : 


241 


LNGKELSFNNIRDMAAIRIIRDFKDRPTWALKHMNPCGIGQADDIETAWDYTYKADPV 


300 


Query: 


301 


SIFGGIVVIiNREVDAATAEKMHPIFLEIIIAPSYSEEALA^ 


360 






SIFGGI+VIJSIREVDAATA+KMHPIFLEIIIAPSYSEFJU^ILTNKKKNLRILELPFDAQA 




Sb j ct : 


301 


SIFGGIIVLNREVDAATAKKMHPIFLEIIIAPSYSEEAIAILTNKKKNLRILELPFDAQA 


360 


Query: 


361 


ASEVEAEYTGWGGLLVQNQDWAENPSDWQWTDRQPTEQEATALEFAWKAIKYVKSNG 


420 






ASEVEAEYTGWGGLLVQNQDWAENPSDWQWTDRQPTEQEATALEFAWKAIKYVKSNG 




Sb j ct : 


361 


ASEVEAEYTGWGGLLVQNQDWAENPSDWQWTDRQPTEQEATALEFAWKAIKYVKSNG 


420 


Query: 


421 


IIITNDHMTLGLGAGQTNRVGSVKIAIEQAKDHLDGAVLASDAFFPFADNIEEIAAAGIK 


480 






IIITNDHMTLGLGAGQTNRVGSVKIAIEQAKDHLDGAVIASDAFFPFADNIEEIAAAGIK 




Sb j ct : 


421 


IIITNDHMTLGLGAGQTNRVGSVKIAIEQAKDHLDGAVLASDAFFPFADNIEEIAAAGIK 


480 


Query: 


481 


AI IQPGGSVRDQES IDAANKHGLTMI FTGVRHFRH 515 








AI IQPGGSVRDQ+S IDAANKHGLTMI FTGVRHFRH 




Sbj ct: 


481 


AI IQPGGS VRDQDS I DAANKHGLTMI FTGVRH FRH 515 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 298 

A DNA sequence (GBSx0326) was identified in S.agalactiae <SEQ ID 961> which encodes the amino acid 
sequence <SEQ ID 962>. This protein is predicted to be similar to antibiotic resistance protein. Analysis of 
this protein sequence reveals the following: 

Possible site: 46 

>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1842 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB12342 GB:Z99106 similar to antibiotic resistance protein 
[Bacillus subtilis] 
Identities = 65/263 (24%) , Positives = 117/263 (43%) , Gaps = 34/263 (12%) 

Query: 5 KNLEIVESIFGD-WDETIIWSCV-QGIMGEVFVDSLDQPKSSLAKLGRKSSFGFIAGQPT 62 

K ++++F D + T ++S + Q I G V+ D PKS +G +S F+AG 
Sbjct: 10 KKYSSLKTMFDDKYCPTFVYSILDQTIPGAVYADDQTFPKSFF- - IGTESGIYFIAGDQG 67 

Query: 63 LFLLEVCSGEDIILVPQHKGWSDLIESTYGQNAHSFKRYATKKDTLFERS 112 

+ +V S + L W +++ + + +R A + 
Sbjct: 68 NRDFHDFIAGYYEEQVKSSKRFTLFSSSKTWDSVLKPILKDDLNQMRRAA.FSY QP 122 

Query: 113 RLEKFVTQLPNGFELRAIDEKV YNSCLEKEWSQDLVANYATYQYYKKQGIGYW 166 

+ K QLP G L+ IDE + +NS +E+ + + + +G G+ V 

Sbjct: 123 KSFKKTLQLPKGL VLKRIDEDI I SHSTAFNSAYYEEY WNSVSQFASKGFGFAV 175 
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Query: 167 YYQGNIIAGASSYSTYKNGIEIEVDTHPDFRRRGLATIVAAQLILTCLDKGIYPSWDAH- 225 

+ ++++ + s N E+++ T ++R GLA VA + I C++ GI PSWD 

Sbjct: 176 LHGNHWSECTS I FLGHNRAEMD I YTIiEEYRGLGLAYCVANRFI AFCMENGI VPSWDCD I 235 

5 

Query: 226 -TRTSLNLSEKLGYEFSHEYIAY 247 

+S+ L+ KLG++ EY Y 
Sbjct: 236 CNNSSIALAAKLGFKTVTEYTIY 258 

10 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 299 

A DNA sequence (GBSx0328) was identified in S.agalactiae <SEQ ID 963> which encodes the amino acid 
15 sequence <SEQ ID 964>. This protein is predicted to be phosphoribosylglycinamide formyltransferase 
homolog (purN). Analysis of this protein sequence reveals the following: 

Possible site: 48 

>>> Seems to have no N-terminal signal sequence 

20 Final Results 

bacterial cytoplasm Certainty=0 . 0736 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0.0000 (Not Clear) < suco 

25 A related DNA sequence was identified in S.pyogenes <SEQ ID 965> which encodes the amino acid 
sequence <SEQ ID 966>. Analysis of this protein sequence reveals the following: 



30 



35 



55 



Possible site: 48 
»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.53 Transmembrane 75 - 91 ( 75 - 91) 



Final Results 

bacterial membrane Certainty=0 . 1213 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:CAA04374 GB:AJ000883 purD [Lactococcus lactis] 
Identities = 236/419 (56%) , Positives = 301/419 (71%) , Gaps = 7/419 (1%) 

40 Query: 50 LKLLWGSGGREHAIAKKLLASKGVDQVFVAPGNDGMTLDGLDLVNIWSEHSRLIAFAK 109 

+K+LV+GSGGREHA+AKK + S V++VFVAPGN GM DG+ +V+I + +L+ FA+ 
MKILVIGSGGREHALAKKFMESPQVEEVFVAPGNSGMEKDGIQIVHISELSNDKLVKFAQ 6 0 



45 I F+GP+ AL G+VD F A L FGP K AAELE SKDFAK IM KY VPTA 



Y TF E A AY++E+G P+V+KADGLA GKGV VA +E A A ++ F S 

50 Sbjct: 121 YATFDSLEPALAYLDEKGVPLVIKADGLAAGKGVTVAFDIETAKSALADI FSGSQ 175 



+WIEEFLDGEEFSLF+F + K Y MP AQDHKRAFD DKGPNTGGMGAY+PV H+ + 



Query: 


50 


Sbjct: 


1 


Query: 


110 


Sb j ct : 


61 


Query: 


170 


Sb j ct : 


121 


Query. 


230 


Sbjct: 


176 


Query: 


290 


Sb j ct : 


236 



W+ A+E +V+P + GM+ EG+ + GVLY GLILT DG K IEFN+RFGDPETQ+ +LPR 
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Query: 350 LTSDFAQNIDDIMMGIEPYITWQKDGVTLGWVASEGYPFDYEKGVPLPEKTDGDIITYY 409 

L SD AQ I DI+ G EP + W + GVTLGVWA+EGYP + G+ LPE +G + YY 
Sbjct: 296 LKSDLAQAI IDIIAGNEPTLEWLESG VTLGvWAftEGYPSQAKLGLILPEI PEG- LNVYY 354 

5 

Query: 410 AGVKFSENSELLLSNGGRVYMLVTTEDSVKAGQDKIYTQLAQQDTTGLFYRNDIGSKAI 468 

AGV +EN++ L+S+GGRVY++ T + VK+ Q +Y +L + + G FYR+DIGS+AI 
Sbjct: 355 AGVSKNENNQ-LISSGGRVYLVSETGEDVKSTQKLLYEKLDKLENDGFFYRHDIGSRAI 412 

1 0 An alignment of the GAS and GBS proteins is shown below: 

Identities = 172/182 (94%) , Positives = 176/182 (96%) 

Query: 1 MKIAVFASGNGSNFQVIAEQFQVSFVFSDHRDAYVLERAQNIAIPSFAFELKEFENKAAY 60 
MKIAVFASGNGSNFQVIAEQF VSFVFSDHRDAYVLERAQNLA1PSFAFELKEFENK AY 
15 Sbjct: 1 MKIAVFASGNGSNFQVIAEQFPVSFVFSDHRDAYVLERAQNLAIPSFAFELKEFENKVAY 60 

Query: 61 EQAWDLLDKHEIDLVCLAGYMKIVGETLLSAYEGRIINIHPTYLPEFPGAHGIKDAWEA 120 

EQA+VDLLDKHEIDLVCLAGYMKIVGETLL AYE RIINIHP YLPEFPGAHGI +DAWEA 
Sbjct: 61 EQAIVDLLDKHEIDLVCIiAGYMKIVGETLLLAYERRI INIHPAYLPEFPGAHGIEDAWEA 120 

20 

Query: 121 GVDQSGVTIHWVDSGVDTGQVIQQVHVPRLADDSLESFETRIHETEYQLYPAVLDSLGIK 180 

GVDQSGVTIHWVDSGVDTGQVIQQV VPRLADDSLES FETRI HETEYQLYPA VLDSLG+ + 
Sbjct: 121 GVDQSGVT I HWVDSGVDTGQVI QQVRVPRLADDSLES FETRI HETEYQLYPA VLDSLGVE 180 

25 Query: 181 RK 182 

RK 

Sbjct: 181 RK 182 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
30 vaccines or diagnostics. 

Example 300 

A DNA sequence (GBSx0329) was identified in S.agalactiae <SEQ ID 967> which encodes the amino acid 
sequence <SEQ ID 968>. Analysis of this protein sequence reveals the following: 

Possible site: 52 
35 »> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.59 Transmembrane 121 - 137 ( 121 - 137) 

Final Results 

bacterial membrane Certainty=0 . 1235 (Affirmative) < suco 

40 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC16901 GB:AF016634 phosphor ibosylf ormylglycinamide 
45 cyclo-ligase [Lactococcus lactis subsp. cremoris] 

Identities = 253/338 (74%) , Positives = 288/338 (84%) , Gaps = 4/338 (1%) 

Query: 4 KNAYAQSGTOVEAGYEVVERIKKHVARTERAGVMGALGGFGGMFDLSQTGVKEPVLISGT 63 
+NAYA+SGVDVEAGYEW RIKKHVA+TER GV+GALGGFGG FDLS VKEPVLISGT 
50 Sbjct: 5 EmYAKSGVDVEAGYEWSRIKKHVAKTERLGVLGALGGFGGSFDLSVLDVKEPVLISGT 64 

Query: 64 DGVGTKLMIAIKYDKHDTIGQDCVAMCVNDIIAAGAEPLYFLDYVATGKNEPAKLEQWA 123 

DGVGTKLMLAI+ DKHDTIG DCVAMCVNDIIAAGAEPLYFLDY+ATGKN P KLEQWA 
Sbjct: 65 DGVGTKLMIAIRADKHDTIGIDCVAMCViroilAAGAEPLYFLDYIATGKNIPEKLEQvVA 124 



55 



Query: 124 GVAEGCTQASAALIGGETAEMPGWifGEDDYDIAGFAVGVAEKSQIIDGSK-VKEGDILLG 182 

GVAEGC+QA AALIGGETAEMPGMY EDDYDLAGFAVGVAEKSQ+IDG K V+ GD+LLG 
Sbjct: 125 GVAEGCLQAGAALIGGETAEMPGMYDEDDYDIjAGFAVGVAEKSQLIDGEKDVEAGDVLLG 184 



60 Query: 



183 



LASSGIHSNGYSLVRRVFADYTGDEVLPELEGKQLKDVLLEPTRIYVKAALPLIKEELVN 242 
LASSGIHSNGYSLVR+VFAD+ +E LPEL+ + L D LL PT+IYVK LPLIK+ + 
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Sbjct: 185 LASSGIHSNGYSLWKVFADFDLNESLPELD-QSLIDTLLTPTKIYVKELLPLIKQNKIK 243 

Query: 243 GIAHITGGGFIEOTPRMFMDMAEIDEDKVPVLPIFKALEKYGDIKHEEMFEIFNMGVG 302 

GIAHITGGGF EN+PRMF + L+AEI E VLPIFKALEKYG IKHEEM+EIFNMG+G 
Sbjct: 244 GIAHITGGGFHENLPRMFGNSLSAEIVEGSWDVLPIFKALEKYGSIKHEEMYEIFNMGIG 303 

Query: 303 LMLDVNPENVDRVKELLDEPVYEIGRI IKKftDDSWIK 340 

+++ V PEN +K+ L+ +EIG+++ + + WIK 
Sbjct: 304 MVIAVAPENAAALKKELN--AFEIGQMVNRQEAPWIK 339, 

A related DNA sequence was identified in S.pyogenes <SEQ ID 969> which encodes the amino acid 
sequence <SEQ ID 970>. Analysis of this protein sequence reveals the following: 

Possible site: 38 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3236 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 321/340 (94%) , Positives = 332/340 (97%) 



Query: 


1 


MSEKNAYAQSGVDVFAGYEVVERIKKHVARTERAGVMGALGGFGGMFDLSQTGVKEPVLI 


60 






MSEKNAYA+SGVnVEAGYEVvERIKKlWARTERAGvTvlGALGGFGGMFDLS+TGVKEPVL+ 




Sb j ct : 


1 


MSEKNAYAKSGVnVEAGYEWERIKKHVARTERAGVMGALGGFGGMFDLSKTGVKEPVLV 


60 


Query: 


61 


SGTDGVGTKLMLAI KYDKHDTIGQDCVAMCVNDI IAAGAEPLYFLDYVATGKNEPAKLEQ 


120 






SGTDGVGTKLMLAI KYDKHDTIGQDCVAMCVNDI IARGREPLYFLDY+ATGKN P KLE+ 




Sb j ct : 


61 


SGTDGVGTKLM1AIKYDKHDTIGQDCVAMCVNDIIAAGAEPLYFLDYIATGKNNPVKLEE 


120 


Query: 


121 


WAGVAEGCVQASAALIGGETAEMPGMYGEDDYDLAGFAVGVAEKSQIIDGSKVKEGDIL 


180 






W+GVAEGCVQA AALIGGETAEMPGMYG+DDYDLAGFAVGVAEKSQIIDGSKVKEGDIL 




Sb j ct : 


121 


WSGVAEGCVQAGAALIGGETAEMPGMYGQDDYDLAGFAVGVAEKSQIIDGSKVKEGDIL 


180 


Query: 


181 


LGI^SGIHSNGYSLVRRVFADYTGDEVLPELEGKQLKDVLLEPTRIYVKAALPLIKEEL 


240 






LGLASSGIHSNGYSLVRRVFADYTG E+LPELEGKQLKDVLLEPTRIYVKAALPLIKEEL 




Sb j ct : 


181 


LGIASSGIHSNGYSLVRRVFADYTGKELLPELEGKQLKDVLLEPTRIYVKAALPLIKEEL 


240 


Query: 


241 


VNGIAHITGGGFIENVPRMFADDIAAEIDEDKVPVLPIFKALEKYGDIKHEEMFEIFNMG 


300 






V GI HITGGGFIEN+PRMFADDLAAEIDEDKVPVLPIFKALEKYGDIKHEEMFEIFNMG 




Sb j ct : 


241 


VKGIGHITGGGFIENIPRMFADDLAaEIDEDKVPVLPIFKALEKYGDIKHEEMFEIFNMG 


300 


Query: 


301 


VGLMLDVNPENVDRVKELLDEPVYEIGRIIKKADDSWIK 340 








VGLML V+PENV+RVKELLDEPVYEIGRIIKKAD SWIK 




Sb j ct : 


301 


VGLMLAVSPENVNRVKELLDEPVYEIGRIIKKADASWIK 340 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 301 

A DNA sequence (GBSx0330) was identified in S.agalactiae <SEQ ID 971> which encodes the amino acid 
sequence <SEQ ID 972>. This protein is predicted to be phosphoribosylpyrophosphate amidotransferase 
(purF). Analysis of this protein sequence reveals the following: 

Possible site: 25 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1112 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 
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bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAD12627 GB:U64311 phosphoribosylpyrophosphate amidotransf erase 
5 [Lactococcus lactis] 

Identities = 340/470 (72%) , Positives = 404/470 (85%) , Gaps = 6/470 (1%) 

Query: 3 YEVKSLNEECGVFGIWGYPQAAQ VTYFGLHSLQHRGQEGAGI I SNDNGKLYGYRNVGLLS 62 
+E K+LNEECG+ FG+WG+ P AA++TYFGLH+LQHRGQEGAGI+ N+NGKL +R +GL++ 
10 Sbjct: 37 FEAKTLNEECGLFGWGHPDAMLTYFGLHMQHRGQEGAGILVNNNGKI^KHRGLGLVT 96 

Query: 63 EVFKNQSELDNLTGNAAIGHVRYATAGSADIRNIQPFLYKFHDGQFALCHNGNLTNAISS 122 

EVF+++ +L+ LTG+ +AIGHVRYATAGSA+ 1 NIQPF ++FHDG h HNGNLTNA S 
Sbjct: 97 EVFRHEKDLEELTGSSAIGHVRYATAGSANIHMIQPFQFEFHDGSLGLAHNGNLTNAQSL 156 

15 

Query: 123 RKELEKQGAIFNASSDTEILMHLIRRSHNPSFMGKVKEALSTVKGGFAYLLMTEDKLIAA 182 

R ELEK GAIF+++SDTEILMHLIRRSH+P FMG+VKEAL+TVKGGFAYL+MTE+ ++AA 
Sbjct: 157 RCELEKSGAI FSSNSDTE I LMHLI RRSHHPEFMGRVKEALNTVKGGFAYLIMTENS I VAA 216 

20 Query: 183 LDPNAFRPLSIGQMQNGAWVISSETCAFEWGAKWVRDVEPGEVILIDDSGIQCDRYTDE 242 

LDPN FRPLSIG+M NGA V++SETCAF+WGA W++DV+PGE+I I+D GI D++TD 
Sbjct: 217 LDPNGFRPLSIGKMSNGALWASETCAFDWGATWIQDVQPGEIIEINDDGIHVDQFTDS 276 

Query: 243 TQLAICSMEYVYFARPDSTIHGVNVHTARKNMGKRLAQEFKQDADIVIGVPNSSLSAAMG 302 
25 T + I CSMEY+ YFARPDS I GVNVHTARK GK IAQE K DADIVIGVPNSSLSAA G 

Sbjct: 277 TNMTICSMEYIYFARPDSNIAGVNVHTARKRSGKILAQEAKIDADIVIGVPNSSLSAASG 336 

Query: 303 FAEESGLPNEMGLVKNQYTQRTFIQPTQELREQGVRMKLSAVSGWKGKRWMIDDSIVR 362 
+AEESGLP EMGL+KNQY RTFIQPTQEIiREQGVRMKLSAV GW+GKRV+M+DDSIVR 
30 Sbjct: 337 YAEESGLPYEMGLIKNQYVARTFIQPTQELREQGVRMKLSAVRGVVEGKRVIMVDDSIVR 396 

Query: 363 GTTSRRIVGLLREAGaTEVHVAIASPELKYPCFYGIDIOTRRELISANHAVDEVCDIIGA 422 

GTTSRRIV LL++AGA EVHVAIASP LKYPCFYGIDIQ R ELI+A H DE+ + IGA 
Sbjct: 397 GTTSRRIVKDLKDAGAAEVHVAIASPALKYPCFYGIDIQDRDELIAATHTTDEIREAIGA 456 

35 

Query: 423 DSLTYLSIDGL1KSIGLETKAPNGGLCVAYFDGHYPTPLYDYEEEYLRSL 472 

DSLTYLS GL+++IG + LC++YFDG YPTPLYDYE +YL SL 

Sbjct: 457 DSLTYLSQSGLVEAIG HDKLCLSYFDGEYPTPLYDYEADYLESL 500 

40 A related DNA sequence was identified in S.pyogenes <SEQ ID 973> which encodes the amino acid 
sequence <SEQ ID 974>. Analysis of this protein sequence reveals the following: 

Possible site: 21 

>>> Seems to have no N-terminal signal sequence 

45 Final Results 

bacterial cytoplasm Certainty=0 . 0610 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

50 An alignment of the GAS and GBS proteins is shown below: 

Identities = 473/484 (97%) , Positives = 481/484 (98%) 

Query: 1 MTYEVKSLNEECGVFGIWGYPQAAQ\OTFGLHSLQHRGQEGAGIISNDNGKLYGYRNVGL 60 
MTYEVKSLNEECGVFGIWG+PQAAQVTYFGIiHSLQHRGQEGAGI+SNDNGKIjYGYRNVGL 
55 Sbjct: 20 MTYEWSLNEECGVFGIWGHPQAAQVTyFGIjHSLQHRGQEGAGIVSNDNGKLYGYRNVGL 79 

Query: 61 LSEVFKNQSELDNLTGNAAIGHVRYATAGSADIRNIQPFLYKFHDGQFALCHNGNLTNAI 120 

LSEVFKNQSELDNLTGNAAIGHVRYATAGSADIRNIQPFLYKFHDGQFALCHNGNLTNAI 
Sbjct: 80 LSEVFKNQSELDNLTGNAAIGHVRYATAGSADIRNIQPFLYKFHDGQFALCHNGNLTNAI 139 



60 



Query: 121 SSRKELEKQ<^IFNASSDTEILMHLIRRSHNPSFMGKVKFjALSTVKGGFAYLLMTEDKLI 180 

S RKELEKQGAIFNASSDTEILMHLIRRSHN SFMGKVKEAL+TVKGGFAYLLMTE+KLI 
Sbjct: 140 SLRKELEKQGAIFNASSDTEILMHLIRRSHNSSFMGKVKEALNTVKGGFAYLLMTENKLI 199 
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Query: 181 AALDPNAFRPLSIGQMQNGAWVISSETCAFEWGAKWVRDVEPGEVILIDDSGIQCDRYT 240 

AALDPNAFRPLSIGQMQNGAWVISSETCAFEWGAKWVRDVEPGEVILIDD GIQCDRYT 
Sbjct: 200 AALDPNAFRPLSIGQMQNGAOTISSETCAFEWGAKWVRDVEPGEVILIDDRGIQCDRYT 259 

5 Query: 241 DETQLAICSMEYVYFARPDSTIHGVNVHTARKNMGKRLAQEFKQDADIVIGVPNSSLSAA 300 

DETQLAICSMEYVYFARPDSTIHGVNVHTARKNMGKRLAQEFKQDADIVIGVPNSSLSAA 
Sbjct: 260 DETQLAICSMEYVYFARPDSTIHGVNVHTARIOMGKRLAQEFKQDADIVIGVPNSSLSAA 319 

Query: 301 MGFAEESGLPNEMGLVKNQYTQRTFIQPTQELREQGVRMKLSAVSGWKGKRWMIDDSI 360 
1 0 MGFAEESGLPNEMGLVKNQYTQRTFIQPTQELREQGVRMKLSAVSGWKGKRWMIDDSI 

Sbjct: 320 MGFAEESGLPNEMGLVKNQYTQRTFIQPTQELREQGVRMKLSAVSGWKGKRWMIDDSI 379 

Query: 361 TOGTTSRRIVGLLREAGATEVHVAIASPELKYPCFYGIDIQTRRELISANHAVDEVCDII 420 
VRGTTSRRIVGLLREAGA+EVHVAIASPELKYPCFYGIDIQTRRELISANH+VDEVCDII 
15 Sbjct: 380 VRGTTSRRIVGLLREAGASEVHVAIASPELKYPCFYGIDIQTRRELISANHSVDEVCDII 439 

Query: 421 GADSLTYLSIDGLIKSIGLETKAPNGGLCVAYFDGHYPTPLYDYEEEYLRSLEEKTSFYI 480 

GADSLTYLS+DGLI+SIGLETKAPNGGLCVAYFDGHYPTPLYDYEEEYLRSLEEKTSFYI 
Sbjct: 440 GADSLTYLSLDGLIESIGLETKAPNGGLCVAYFDGHYPTPLYDYEEEYLRSLEEKTSFYI 499 



20 



Query: 481 QKVK 484 
QKVK 

Sbjct: 500 QKVK 503 



25 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 302 

A DNA sequence (GBSx0331) was identified in S.agalactiae <SEQ ID 975> which encodes the amino acid 
sequence <SEQ ID 976>. Analysis of this protein sequence reveals the following: 

30 Possible site: 28 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4797 (Affirmative) < suco 

35 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) •< suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

40 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 303 

A DNA sequence (GBSx0332) was identified in S.agalactiae <SEQ ID 977> which encodes the amino acid 
sequence <SEQ ID 978>. Analysis of this protein sequence reveals the following: 

45 Possible site: 13 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3489 (Affirmative) < suco 

50 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 304 

A DNA sequence (GBSx0333) was identified in S.agalactiae <SEQ ID 979> which encodes the amino acid 
5 sequence <SEQ ID 980>. Analysis of this protein sequence reveals the following: 

Possible site: 14 

»> Seems to have no N-terminal signal sequence 

Final Results 

10 bacterial cytoplasm Certainty=0 . 1690 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

15 >GP:CAC12194 GB:AL445066 phosphoribosylf ormylglycinamidine synthase 

related protein [Thermoplasma acidophilum] 
Identities = 199/746 (26%) , Positives = 329/746 (43%) , Gaps = 103/746 (13%) 



20 



30 



50 



Query: 202 ADD--FAAYKAEQGLAMEVDDLLFIQDYFKSIGRVPTETELKVLDTYWSDHCRHTTFETE 259 

ADD A GLA+ +D++ ++ YF+ +GR P + E+ + WS+HC + + + 

Sbjct: 11 ADDARLKAISKRLGLALSLDEMKAVRSYFERLGRDPIDAEIHAVAQSWSEHCSYKSSKYY 70 



Query: 260 LKNIDFSASKFQKQLQATYDKYIAMRDELGRSEKPQTLMDMATIFGRYERANGRLDDMEV 319 
LK K+ L+ Y +AM D+ G 
25 Sbjct: 71 LK KYLGSLKTDYT - 1 LAMEDDAG 92 



Query: 320 SDEINACSVEIEVDVDGVKEPWLLMFKNETHNHPTEIEPFGGAATCIGGAIRDPLSGRSY 379 

VD DG + + K E+HNHP+ +EP+GGAAT IGG +RD L + 
Sbjct: 93 -VVDFDG---EYAYVLKMESHNHPSAVEPYGGAATGIGGIVRDVLCMGAQ 138 

Query: 380 VYQAMRISGAGDITTPIAETRAGKLPQQVISKTAAHGYSSYGNQIGLATTYVREYFHPGF 439 

+ GD+++ E G L + I G YGN+IG+ YF + 

Sbjct: 139 PVALIDSLFLGDVSSDRYE GLLSPRYIFGGWGGIRDYGNRIGIPNVAGSLYFDKLY 195 

35 Query: 440 VAKRMELGAWGAAPKENVTOEKP-EAGDVVvLLGGKTGRDGVGGATGSSKVQTVESVET 498 

+ + VG ++ +VR K + GDV+VL+GGKTGRDG+ G +S + ++ 

Sbjct: 196 NSNPLWAGCVGIVRRDRIVRSKSYKPGDVLVLMGGKTGRDGIHGVNFASTTLG-KVTKS 254 

Query: 499 AGAEVQKGNAIEERKIQRLFRDGNVTRLIKKSNDFGAGGVCVAIGELAD GLEIDLD 554 

40 + +Q GN I E+ + + + N LI+ D G GG+ A E+ G EI LD 

Sbjct: 255 SRLAIQLGNPI VEQPMIKAVLEANDAGIiIRAMKDLGGGGLSSAATEMVYAGGFGAEITLD 314 

Query: 555 KVPLKYQGLNGTEIAISESQERMSWVGPSDVDAFIAACNKENIDAVWATVTEKPNLVM 614 
+ LK ++G EI ISESQERM + P DV+ K N+D V+ VT + + 

45 Sbjct: 315 DIKLKESNMSGWEIWISESQERMLMECYPEDVEKIRQIAEKWNLDFSVIGQVTADRRIRV 374 

Query: 615 TVWGETIVDLERCFLDTNGV-RWVDAKVVDKDLWPEARTTSAETLEADMLKVLSDLNH 673 

+ I+D++ FLD + V + K V+K +TVP+ E L + + ++ LN 

Sbjct: 375 YYKKRKIIDMDIEFLDDSPVYQRPYRIKEVEKSvTVPQ EPEDLNSFVRDFMARLNT 430 



Query: 674 ASQKGLQTIFDSSVGRSTVNHPIGGR-YQITPTESSVQKLPVQYGVTTTASVMAQGYNPY 732 

++ + + D +V ST+ P GR + T +++V K P++ + V+ G P 

Sbjct: 431 C^FNVWQYDHTVRGSTIVTPFVGRPNKETHADATVIK-PLENSM--RGLVLTSGSRPN 487 



55 Query: 733 IAEWSPYHGAAYAVIEATARLVATGADWSRARFSYQEYFERMDKQAERFGQPVSALLGSI 792 
+ PY G + EA +++TG R ++ E GQ V ++ 

Sbjct: 488 MVSVDPYAGTLLTLAEAYKNILSTG GRPHSWDALNFGNPEREEIMGQFVESVRAIG 544 



60 



Query: 
Sb j ct : 



793 EAQIQFGLPSIGGKDSMSGTFEELTVPPTLVAFGVTTADS-RKVLSPEFKAAGENIY 848 

+ + GLP + G S + + + PT V D R+ + K +G IY 

545 DFCRKMGLPWAGNVSFYNEYRKTDIMPTPTIMMVGLIDDVRRSRTTYMKGSGNAIYLIG 604 
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Query: 849 YIPGQAISEDIDFDLIKANF--SQFFAIQAQHKITAASAVKYGG 890 

Y G + D+D +F S+ + I + H +++ GG 
Sbjct: 605 EPCDNLTGSEYSRMHGYTDGFLPAPDLDELTRIRDFLSSKADMILSSHDVSS GG 658 

5 Query: 891 VLESLALMTFGNRIGASVEIAELDSS 916 

+ +L+ M+FG+ IG V+I+ + ++ 
Sbjct: 659 LFAALSEMSFGSGIGFHVDISNVSAA 684 

A related DNA sequence was identified in S.pyogenes <SEQ ID 98 1> which encodes the amino acid 
10 sequence <SEQ ID 982>. Analysis of this protein sequence reveals the following: 

Possible site: 51 

>» Seems to have no N-terminal signal sequence 

Final Results 

15 bacterial cytoplasm Certainty=0 . 1415 (Affirmative) suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

20 Identities = 1219/1256 (97%) , Positives = 1226/1256 (97%) 

SSYFRVAPLSDLVSYMNKRIFVEKKADFGIKSASLVKELTHNLQLASLKDLRIVQVYDVF 70 
SSYF VAPLSDLVSYMNKRI FVEKKADFGIKSASLVKELTHNLQL SLK LRIVQVYDVF 
SSYFPvAPLSDLVSYMNKRIFVEKKADFGIKSASLVKELTHNLQLTSLKALRIVQVYDVF 61 



25 



30 



35 I NLDFFETYQADDFA YKAEQGIAMEVDDLLFIQ+YFKSIG VPTETELKVLDTYWSDH 



40 



45 



50 



60 



Query: 


11 


Sbjct: 


2 


Query: 


71 


Sbjct: 


62 


Query: 


131 


Sbjct: 


122 


Query: 


191 


Sb j ct : 


182 


Query: 


251 


Sb j ct : 


242 


Query: 


311 


Sbjct: 


302 


Query: 


371 


Sb j ct : 


362 


Query: 


431 


Sb j ct : 


422 


Query: 


491 


Sb j ct : 


482 


Query: 


551 


Sb j ct : 


542 


Query: 


611 


Sbjct: 


602 



NLAEDLLARAEKHIFSEQVTD LLTE EITAELDKVAFFAIEALPGQFDQRAASSQEALL 
NLAEDLLARAEKHIFSEQVTDCLLTETEITAELDKVAFFAIEALPGQFDQRAASSQEALL 121 



L GSDSQVKVNTAQLYLVNKDI EAELEAVKNYLLNPVDSRFKDITLPLE QAFSVSDKT 



CRHTTFETELKNIDFSASKFQKQLQ TYDKYIAMRDELGRSEKPQTLMDMATI FGRYERA 



NGRLDDMEVSDEINACSVEIEVDVDGVKEPWLLMFKNETHNHPTEIEPFGGAATCIGGAI 



RDPLSGRSYVYQAMRISGAGDITTPIAETRAGKLPQQVISKTAAHGYSSYGNQIGLATTY 



VREYFHPGFVAKRMELGAWGAAPKENWREKPEAGDW+LLGGKTGRDGVGGATGSSKV 



55 QTVESVETAGAEVQKGNAIEERKIQRLFRDGNVTRLIKKSNDFGAGGVCVAIGELADGLE 



IDLDKVPLKYQGLNGTE IAI SESQERMSWV P+DVDAFIAACNKENIDAVWATVTEKP 



NLVMTWNGE IVDLER FLDTNGVRWVDAKVA/DKDLTVPEARTTSAETLEAD LKVLSD 



WO 02/34771 



-389- 



PCT/GB01/04789 



Query: 671 LNHASQKGLQTIFDSSVGRSTVNHPIGGRYQITPTESSVQKLPVQYGVTTTASVMAQGYN 730 

IiNHASQKGLQTIFDSSVGRSTVIfflPIGGRYQITPTESSVQKLPVQ+GVTTTASVMAQGYN 
Sbjct: 662 LNHASQKGLQTIFDSSVGRSTVOTPIGGRYQITPTESSVQKLPVQHGVTTTASVMAQGYN 721 

5 Query: 731 PYIAEWSPYHGAAYAVIEATARLVATGADWSRARFSYQEYFERMDKQAERFGQPVSALLG 790 

PYIAEWSPYHGAAYAVIEATARLVATGADWSRARFSYQEYFERMDKQAERFGQPVSALLG 
Sbjct: 722 PYIAEWSPYHGAAYAVIEATARLVATGADWSRARFSYQEYFERMDKQAERFGQPVSALLG 781 

Query: 791 SIEAQIQFGLPSIGGKDSMSGTFEELTVPPTLVA.FGVTTADSRKVLSPEFKAAGENIYYI 850 
10 SIEAQIQ GLPSIGGKDSMSGTFE+LTVPPTLVAFGVTTADSRKVLSPEFKAAGENIYYI 

Sbjct: 782 SIEAQIQLGLPSIGGKDSMSGTFEDLTVPPTLVAFGVTTADSRKVLSPEFKAAGENIYYI 841 

Query: 851 PGQAISEDIDFDLIKANFSQFEAIQAQHKITAASAVKYGGVLESLALMTFGNRIGASVEI 910 
PGQAI SEDIDFDLI K NFSQFEAIQAQHKITAASA KYGGVLESLALMTFGNRIGASVEI 
15 Sbjct: 842 PGQAISEDIDFDLIKDNFSQFEAIQAQHKITAASAAKYGGVLESLALMTFGNRIGASVEI 901 

Query: 911 AELDSSLTAQLGGFVFTSVEEIADVVKIGQTQADFTVTVNGNDLAGASLLSAFEGKLEEV 970 

AELDSSLTAQLGGFVFTS EEIAD VKIGQTQADFTVTVNGNDLAGASLL+AFEGKLEEV 
Sbjct: 902 AELDSSLTAQLGGFVFTSAEEIADAVKIGQTQADFTVTVNGNDLAGASLIiAAFEGKLEEV 961 

20 

Query: 971 YPTEFEQVDAIEEVPAWSDWIKAKEIIEKPVWIPVFPGTNSEYDSAKAFEQVGASVN 1030 

YPTEFEQ D +EEVPAWSD VIKAKE IEKPWYIPVFPGTNSEYDSAKAFEQVGASVN 
Sbjct: 962 YPTEFEQTDVLEEVPAWSDTVIKAKETIEKPWYIPVFPGTNSEYDSAKAFEQVGASVN 1021 

25 Query: 1031 LVPFOTLNEAAIAESVDTWANIAKANIIFFAGGFSAADEPDGSAKFIVNILLlffiron^ 1090 

LVPFVTLNE AIAESVDTMVANIAKANI IFFAGGFSAADEPDGSAKFIVNILLNEKVRAA 
Sbjct: 1022 LVPFWIWEVAIAESVDTMVANIAKANIIFFAGGFSAADEPDGSAKFIVNILLNEKVRAA 1081 

Query: 1091 IDSFIEKGGLIIGICNGFQALVKSGLLPYGNFEEAGETSPTLFYNDANQHVAKMVETRIA 1150 
30 IDSFIEKGGLIIGICNGFQALVKSGLLPYGNFEEAGETSPTLFYNDANQHVAKMVETRIA 

Sbjct: 1082 IDSFIEKHGLIIGICNGFQAIjVKSGIjLPYGNFEEAGETSPTLFYNDANQHVAKMVETRIA 1141 

Query: 1151 NTNSPWLAGVEVGDIHVIPVSHGEGKFWSASEFAELRDNGQIWSQYVDFDGQPSMDSKY 1210 
NTNSPWIAGVEVGDIH ipvshgegk WSASEFAELRDNGQIWSQYVDFDGQPSMDSKY 
35 Sbjct: 1142 OTNSPWIAGVEVGDIHAIPVSHGEGKLWSASEFAELRDNGQIWSQYVDFDGQPSMDSKY 1201 

Query: 1211 NPNGSWAIEGITSKNGQIIGKMGHSERWEDGLFQNIPGNKDQKLFESAVKYFTGK 1266 

NPNGSVNAIEGITSKNGQIIGKMGHSERWEDGLFQNIPGNKDQ LF SAVKYFTGK 
Sbjct: 1202 NPNGSVNAIEGITSKNGQIIGKMGHSERWEDGLFQNIPGNKDQILFASAVKYFTGK 1257 

40 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 305 

A DNA sequence (GBSx0334) was identified in S.agalactiae <SEQ ID 983> which encodes the amino acid 
45 sequence <SEQ ID 984>. This protein is predicted to be phosphoribosylaminoimidazole- 
succinocarboxamide synthase (purC). Analysis of this protein sequence reveals the following: 

Possible site: 41 

>>> Seems to have no N-terminal signal sequence 

50 Final Results 

bacterial cytoplasm Certainty=0. 4783 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

55 The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAA03540 GB:L15190 SAICAR synthetase [Streptococcus pneumoniae] 
Identities = 183/231 (79%) , Positives = 203/231 (87%) 

Query: 1 MTNQLIYTGKAKDIYSTKDEWIRTVYKDQATMIjNGARKETIDGKGAIiNNQISSLIFEKL 60 
60 M+ QLIY+GKAKDIY+T+DEN+I + YKDQAT NG +KE I GKG LNNQISS IFEKL 

Sbjct: 1 MSKQLIYSGKAKDIYTTEDENLI ISTYKDC^TAFNGVKKEQIAGKGVLNNQI SSFI FEKL 60 
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Query: 61 NM&GWTHYIEQI SKNEQIiNKKVDI I PLE VvLRNVTAGSFSKRFG VEEGHVLETPI VEFY 120 

N AGV TH++E++S EQLNKKV IIPLEWLRN TAGSFSKRFGV+EG LETPIVEFY 
Sbjct: 61 NAAGVATHFVEKLSDTEQLNKKVKIIPLEWLRNYTAGSFSKRFGVDEGIALETPIVEFY 120 

Query: 121 YKIOtfljNDPFINDEHVKFLGIvNDEEIAYLKGETR^^ 180 

YKND+L+DPFINDEHVKFL I +D++IAYLK E R INELLK WFA+IGL LIDFKLEFG 
Sbjct: 121 YKNDDLDDPFINDEHVKFLQIADDQQIAYLKEEARRIIffiLLICVWFAEIGLKLIDFKLEFG 180 

Query: 181 FDKDGKIILADEFSPDNCRLWDADGraMDKDVFRRDLGSLTDVYQVVLEKL 231 

FDKDGKI ILADEFSPDNCRLWDADGNHMDKDVFRR LG LTDVY++V EKL 
Sbjct: 181 FDKDGKIILADEFSPDNCRLWDADGNHMDKDVFRRGLGELTDVYEIVWEKL 231 

A related DNA sequence was identified in S.pyogenes <SEQ ID 985> which encodes the amino acid 
sequence <SEQ ID 986>. Analysis of this protein sequence reveals the following: 

Possible site: 51 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3935 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 221/234 (94%) , Positives = 228/234 (96%) 

Query: 1 MTNQLIYTGKAKDIYSTKDENVIRTVYKDQATMLNGARKETIDGKGAI^NNQISSLIFEKL 60 

+TNQLIY GKAKDIYSTKDENVIRTVYKDQATMLNGARKETIDGEGAIiOTQISSLIFEKIi 
Sbjct: 11 VTNQLIYKGKAKDIYSTKDENVIRTVYKDQATMLNGARKETIDGKGALiNNQISSLIFEKL 70 

Query: 61 NMAGVWHYIEQISKNEQLNKKVDIIPLEvAnjRNVTAGSFSKRFGVEEGHVLETPIVEFY 120 

N AGVOTHYIEQISKNEQLNKKVDIIPLEVVLRNWAGSFSKRFGVEEGHVLETPIVEFY 
Sbjct: 71 NKAGvVTHYIEQISKNEQLNKKVDIIPLEWLRNVTAGSFSKRFGVEEGHVLETPIVEFY 130 

Query: 121 YKlTONinSIDPFlNDEHVKFLGI VNDEEIAYLKGETRHINELLKDWFAQIGLNLIDFKLEFG 180 

YKND+L+DPFINDEHVKFLGIVNDEEIAYLKGETR INELLK WFAQIGLNLIDFKLEFG 
Sbjct: 131 YKNDDLDDPFINDEHVKFLGIVNDEEIAYLKGETRRINELLKGWFAQIGLNLIDFKLEFG 190 

Query: 181 FDKDGKIILADEFSPDNCRLWDADGNHMDKDVFRRDLGSLTDVYQVvLEKLIAL 234 

FD++G IILADEFSPDNCRLWD +GNHMDKDVFRRDLG+LTDVYQWLEKLIAL 
Sbjct: 191 FDQEGTIILADEFSPDNCRLWDKNGNHMDKDVFRRDLGNLTDVYQWLEKLIAL 244 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 306 

A DNA sequence (GBSx0335) was identified in S.agalactiae <SEQ ID 987> which encodes the amino acid 
sequence <SEQ ID 988>. Analysis of this protein sequence reveals the following: 

Possible site: 13 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2779 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9457> which encodes amino acid sequence <SEQ ID 9458> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 
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>GP:AAC35700 GB:AF041468 acyl carrier protein [Guillardia theta] 
Identities = 27/75 (36%) , Positives = 52/75 (69%) 

Query: 12 MSRDEVFEKMLELLRQQLGDPQLDITPESSLHDDLAIDSIALTEFIINLEDVFHLEIPDE 71 
5 M+ E+FEK+ ++ +QLG + +T +++ +DL DS+ E ++ +E+ F++EIPD+ 

Sbjct: 1 rMEQEIFEKVQTIISEQLGVDKSQVTKDANFANDLGA^^ 60 

Query: 72 AVEHMSSVQQLLDYI 86 
A E +S++QQ +D+I 
10 Sbjct: 61 AAEQISNLQQAVDFI 75 

A related DNA sequence was identified in S.pyogenes <SEQ ID 989> which encodes the amino acid 
sequence <SEQ ID 990>. Analysis of this protein sequence reveals the following: 

Possible site: 24 
15 >» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1917 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

20 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 36/77 (46%) , Positives = 57/77 (73%) 

25 Query: 12 MSRDEVFEKMLELLRQQLGDPQLDITPESSLHDDLAIDSIALTEFIINLEDVFHLEIPDE 71 

M+R E+FE+++ L+++Q + IT ++ L +DLA+DSI L EFIIN+ED FH+ IPDE 

Sbjct: 1 MTRQEIFERLINLIQKQRSYLSVAITEQTHLKNDLAVDSIELVEFIINVEDEFHIAIPDE 60 

Query: 72 AVEHMSSVQQLLDYIIE 88 
30 VE M ++ +LDY+++ 

Sbjct: 61 DVEDMVFMRDILDYLVQ 77 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

35 Example 307 

A DNA sequence (GBSx0336) was identified in S.agalactiae <SEQ ID 991> which encodes the amino acid 
sequence <SEQ ID 992>. This protein is predicted to be fatty acid/phospholipid synthesis protein (plsX). 
Analysis of this protein sequence reveals the following: 

Possible site: 21 
40 »> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.64 Transmembrane 101 - 117 ( 101 - 117) 

Final Results 

bacterial membrane Certainty=0 . 1256 (Affirmative) < suco 

45 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9455> which encodes amino acid sequence <SEQ ID 9456> 
was also identified. 

50 The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB13462 GB:Z99112 alternate gene name: ylpD [Bacillus subtilis] 
Identities = 174/329 (52%) , Positives = 238/329 (71%) , Gaps = 2/329 (0%) 

Query: 8 KIAIDAMGGDYAPKAIVEGVNQAISDFSDIEVQLYGDQKKIEKYLTVT-ERVSIIHTEEK 66 
55 +IA+DAMGGD+APKA+++GV +1 F D+ + L GD+ IE +LT T +R++++H +E 

Sbjct: 2 RIAVDAMG^DHAPKAVIDGVIKGIEAFDDIjHITLVGDKTTIESHLTTTSDRITVLHADEV 61 



WO 02/34771 



PCT/GB01/04789 



-392- 



10 



Query: 


67 


Sbjct: 


62 


Query: 


127 


Sb j ct : 


122 


Query: 


187 


Sbjct: 


182 


Query: 


247 


Sbjct: 


242 


Query: 


307 


Sbjct: 


301 



INSDDEPAKAVRRKKQSSMVLGAKft.VKDGVAQAFISAGNTGALLAAGLFWGRIKGVDRP 126 
I DEP +AVRRKK SSMVL A+ V + A A ISAGNTGAL+ AGLF+ VGRI KG+DRP 
IEPTDEPVRAWRKKNSSMVLMAQEVAENRADACISAGNTGALMTAGLFIVGRIKGIDRP 121 



L T+PT+ G GF +LD+GAN + HL QYAI+GS Y++ VRG+ PRVGLLN GTE 



+ KG+ L K+ +++L +INFIGN+EARDL+ VADWVTDGFTGN LKT+EG+A+S 



15 I ++ + + + +KL A +LK L ++K M+YS+ GGA LFGLKAP++K HGSSDS 



AV+ ++Q R M+ V + + +E 

20 

A related DNA sequence was identified in S.pyogenes <SEQ ID 993> which encodes the amino acid 
sequence <SEQ ID 994>. Analysis of this protein sequence reveals the following: 

Possible site: 36 
25 >>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.07 Transmembrane 121 - 137 ( 120 - 138) 

Final Results 

bacterial membrane Certainty=0 . 1829 (Affirmative) < suco 

30 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

A related sequence was also identified in GAS <SEQ ID 9127> which encodes the amino acid sequence 
<SEQ ID 9128>. Analysis of this protein sequence reveals the following: 

35 Possible cleavage site: 16 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.07 Transmembrane 95 - 111 ( 94 - 112) 

Final Results 

40 bacterial membrane Certainty= 0 . 183 (Affirmative) < suco 

bacterial outside Certainty= 0.000 (Not Clear) < suco 

bacterial cytoplasm Certainty= 0.000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

45 Identities = 254/330 (76%) , Positives = 290/330 (86%) 

Query: 6 MKKIAIDAMGGDYAPKAI VEGVNQAI SDFSDIEVQLYGDQKKIEKYLTVTERVS I IHTEE 65 

MK+IAIDAMGGD APKAIVEGVNQAI FSDIE+QLYGDQ KI YL ++RV+IIHT+E 
Sbjct: 27 MKRIAIDAMGGDNAPKAIVEGVNQAIEAFSDIEIQLYGDQTKINSYLIQSDRVAIIHTDE 86 

50 

Query: 66 KINSDDEPAKAVRRKKQSSMVLGAKAVKIX3VAQAFISAGNTGALLAAGLFVVGRIKGVDR 125 

KI SDDEPAKAVRRKK++SMVL AKAVK+G A A ISAGNTGALLA GLFWGRI KGVDR 
Sbjct: 87 KIMSDDEPAKAVRRKKKASbWIAAKAVKEGKADAIISAGNTGALIAVGLFWGRIKGVDR 146 

55 Query: 126 PGLMSTMPTLDGVGFDMLDLGANAENTASHLHQYAILGSFYAKIIVRGIEVPRVGLIjISINGT 185 

PGL+ST+PT+ G+GFDMLDLGANAENTA HLHQYAILGSFYAKNVRGI PRVGLLNNGT 
Sbjct: 147 PGLLSTIPTVTGLGFDMLDLGANAEOTAKHLHQYAILGSFYAKNVRGIANPRVGLLNNGT 206 

Query: 186 EETKGDSLHKEAYELLAAEPSINFIGNIEARDLMSSVADWVTDGFTGNAVLKTMEGTAM 245 
60 EETKGD L K YELL A+ + I + F+GN+EAR+LMS VADV+V+DGFTGNAVLK+ +EGTA+ 

Sbjct: 207 EETKGDPLRKATYELLTADNTISFVGNVEARELMSGVADVIVSDGFTGNAVLKSIEGTAI 266 



Query: 246 SIMGSLKSSIKSGGVKAKLGALLLKDSLYQLKDSMDYSSAGGAVLFGLKAPIVKCHGSSD 305 
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SIMG LK I SGG+K K+GA LLK SLY++K ++DYSSAGGAVLFGLKAP+VK HGSSD 
Sbjct: 267 SIMGQLKQIINSGGIKTKIGASLLKSSLYEMKKTLDYSSAGGAVLFGLKAPWKSHGSSD 326 

Query: 306 SKAVYSTLKQVRTMLETQWDQLVDAFTDE 335 
5 KA++ST+KQVRTML+T W QLV+ F E 

Sbjct: 327 VKAIFSTIKQWTMLDTNWGQLVEEFAKE 356 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

10 Example 308 

A DNA sequence (GBSx0337) was identified in S.agalactiae <SEQ ID 995> which encodes the amino acid 
sequence <SEQ ID 996>. Analysis of this protein sequence reveals the following: 



15 



20 



30 



35 



40 



Possible site: 27 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4668 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

25 Example 309 

A DNA sequence (GBSx0338) was identified in S.agalactiae <SEQ ID 997> which encodes the amino acid 
sequence <SEQ ID 998>. Analysis of this protein sequence reveals the following: 



Possible site: 32 

>» Seems to have no N-terminal signal sequence 



INTEGRAL 


Likelihood 


=-12, 


,84 


Transmembrane 


61 


- 77 


( 55 


- 82) 


INTEGRAL 


Likelihood 


= -10. 


.14 


Transmembrane 


26 


- 42 


( 19 


- 51) 


INTEGRAL 


Likelihood 


= -9. 


.77 


Transmembrane 


192 


- 208 


( 186 


- 211) 


INTEGRAL 


Likelihood 


= -5. 


.79 


Transmembrane 


267 


- 283 


( 262 


- 286) 


INTEGRAL 


Likelihood 


= -3. 


,77 


Transmembrane 


100 


- 116 


( 99 


- 116) 



Final Results 

bacterial membrane Certainty=0 . 6137 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) <, suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9453> which encodes amino acid sequence <SEQ ID 9454> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAA22372 GB:AL034446 putative transmembrane protein 
45 [Streptomyces coelicolor A3 (2) ] 

Identities = 47/154 (30%) , Positives = 69/154 (44%) , Gaps = 12/154 (7%) 

Query: 120 SGFvEISSSNSFSFGPFFFLFIiAYFIQSLTEEILFRGYVMTTVTKFKGSFAGVLCNSMLF 179 
SG+ E+ S F+A + TEE++FRG + + + G++ + ++F 

50 Sbjct: 118 SGYYE VDGLGS VQGAIGLVGFMA- -AAAATEEWFRGVLFRI IEEHIGTYLALGLTGLVF 175 



Query: 180 SFIHFRN YGITAIALFNLFLDGIIFSILFNMTKNILFVTGVHTTWNFTMGCVLGN 234 

+H N +G AIA+ F+L ++ T+N+ GVH WNF G V 
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Sbjct: 176 GLMHLLNEDATLWGALAIAIEAGFMLAAAYAA TRNLWLTIGVHFGWNFAAGGVFST 231 

Query: 235 KVSGGDSPVSLFRITENSSFALWNGGDFGFEGGV 268 
VSG L T S L GGDFG EG V 

5 Sbjct: 232 WSGNGDSEGLLDAT - MSGPKLLTGGDFGPEGSV 264 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

10 Example 310 

A DNA sequence (GBSx0339) was identified in S.agalactiae <SEQ ID 999> which encodes the amino acid 
sequence <SEQ ID 1000>. Analysis of this protein sequence reveals the following: 



15 



20 



Possible site: 55 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2665 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



A related GBS nucleic acid sequence <SEQ ID 945 1> which encodes amino acid sequence <SEQ ID 9452> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB05088 GB.-AP001511 unknown conserved protein [Bacillus halodurans] 
25 Identities = 81/242 (33%) , Positives = 124/242 (50%) , Gaps = 3/242 (1%) 

Query: 8 GLVLYNRNYREDDKLVKIFTETEGKRMFFVKHAS- -KSKFNAVLQPLTIAHFILKINDNG 65 

G+V+ +Y E +K+V +FT GK + A KS+ AV Q T + + N G 

Sbjct: 7 GIVIRTVDYGESNKIVTVFTREYGKIALM^GAKRPKSRLTAVTQLFTYGMMMFQKNA-G 65 

30 

Query: 66 LSYIDDYKEVLAFQETNSDLFKLSYASYITSLADVA1SDNVADAQLFIFLKKTLELIEDG 125 

L + + + +F+E +DLF+ SY SY+T L + D + LF L +T+ + +G 
Sbjct: 66 LGTLTQGEIIQSFREVRNDLFRASYVSYVTDLTNKLTEDEKRNPYLFELLYQTIHYMNEG 125 

35 Query: 126 LDYEILTNIFEVQLLERFGVALNFHDCVFCHRVGLPFDFSHKYSGLLCPNHYYKDERRNH 185 

+D ++LT IFEV++ G+ CV C +P FS K +G LC KD 

Sbjct: 126 MDPDVLTRIFEVKMFTVAGIKPELDQCVSCRSTDVPVGFSIKEAGFLCKRCIEKDPHAYK 185 

Query: 186 LDPNMLYLINRFQSIQFDDLQTISVKPEMKLKIRQFLDMIYDEYVGIHLKSKKFIDDLSSWG 247 
40 + + L+ F L TIS+KPE K ++ + YDEY G+HLKS++F+D L S G 

Sbjct: 186 ITAQVAKLLRLFYHFDLQRLGTISLKPETKATLKTIIHQYYDEYSGLHLKSRRFLDQLESMG 247 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1001> which encodes the amino acid 
sequence <SEQ ID 1002>. Analysis of this protein sequence reveals the following: 

45 Possible site: 46 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1566 (Affirmative) < suco 

50 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 159/251 (63%) , Positives = 210/251 (83%) 

55 

Query: 1 I^VSQTYGLVLYNRNYREDDKLVXIFTETEGKRMFFvTCHASKSKFNAVLQPLTIAHFILK 60 
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15 



45 



50 



Sbjct: 


1 


Query: 


61 


Sbj ct: 


61 


Query: 


121 


Sbj ct : 


121 


Query: 


181 


Sbj ct : 


181 


Query: 


241 


Sbj ct : 


241 



-395- 

M+++++ G+VL+NRNYREDDKLVKIFTE GK+MFFVKH S+SK ++++QPLTIA FI K 
MQLTESLGIVLFNRNYREDDKLVKI FTEVAGKQMFFVKHISRSKMSS I IQPLTIADFIFK 6 0 



+ND GLSY+ DY V ++ N+D+F+L+YASY+ +LAD AI+DN +D+ LF FLKKTL+ 



L+E+GLDYEILTNI FE+Q+L+RFG+ +LNFH+C CHR LP DFSH++S +LC HYYKD 



RRNHLDPN++YL++RFQ I FDDL+TIS+ ++K K+RQF+D +Y +YVGI LKSK FI 



D+L WG IMK 
DNLVKWGDIMK 251 

20 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 311 

A DNA sequence (GBSx0340) was identified in S.agalactiae <SEQ ID 1003> which encodes the amino 
acid sequence <SEQ ID 1004>. This protein is predicted to be aromatic amino acid aminotransferase 
25 (patA). Analysis of this protein sequence reveals the following: 

Possible site: 14 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -3.13 Transmembrane 141 - 157 ( 140 - 159) 

30 Final Results 

bacterial membrane Certainty=0 .2253 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

35 A related GBS nucleic acid sequence <SEQ ID 9449> which encodes amino acid sequence <SEQ ID 9450> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAF06954 GB:AF146529 aromatic amino acid aminotransferase 
[Lactococcus lactis subsp. cremoris] 
40 Identities = 261/391 (66%) , Positives = 323/391 (81%) 

MTLEKRFNKYLDRIEVSLIRQFDQSISDIPGMVKLTLGEPDFTTPDHVKEAAKSAIDANQ 97 
M L K+FN LD+IE+SLIRQFDQ +S IP ++KLTLGEPDF TP+HVK+A +AI+ NQ 
MDLLKKFNPNLDKIEISLIRQFDQQVSSIPDIIKLTLGEPDFYTPEHVKQAGIAAIENNQ 60 

SYYTGMSGLLALRQAAADFAKDKYI^TYNPDCEILvTIGATEALSASLIAILEAGDVVLL 157 
S+YTGM+GLL LRQAA++F KY L+Y + EILVT+G TEA+S+ L++IL AGD VL+ 
SHYTGMAGLLELRQAASEFLLKKYGLSYAAEDEILVTVGVTEAISSVLLSILVAGDEVLI 120 



Query: 


38 


Sbjct: 


1 


Query: 


98 


Sbjct: 


61 


Query: 


158 


Sbjct: 


121 


Query: 


218 


Sbjct: 


181 



PAPAYPGYEP++ L G +VEIDTR NDF LTPEML+ AII++ K+KAV+LNYP NPTG 



55 +TY+R++I LAEVLKK+++FVI+DEVYSEL YT Q HVSIAEY P QTI++NGLSKSHA 



Query: 



278 MTGWRVGLvYAPFAFIAQIIKSHQY^WTAASTISQFAGvEALSVGKNDTLPMRQGYIKRR 337 
MTGWR+GL++A +AQIIK+HQY+VT+AST SQFA +EAL G +D LPM++ Y+KRR 
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Sbjct: 241 MTGWRIGLIFAARELVAQIIKTHQYLVTSASTQSQFAAIEALKNGADDALPMKKEYLKRR 300 

Query: 338 DYI IDKMSKLGFKI IKPSGAFYI FAKI PDSYPQDSFKFCQDFAYQQAVAI I PGVAFGKYG 397 

DYII+KMS LGFKII+P GAFYIFAKIP QDSFKF DFA + AVAI I PG+AFG+YG 
Sbjct: 301 DYI IEKMSALGFKI IEPDGAFYIFAKI PADLEQDSFKFAVDFAKENAVAI I PGIAFGQYG 360 

Query: 398 EGYIRLSYAASMEVIETAMARLKVFMESYEG 428 

EG++RLSYAASM+VIE AMARL ++ G 
Sbjct: 361 EGFTOLSYAASMDVIEQAMARLTDYVTKKRG 391 



There is also homology to SEQ ID 1006. 

SEQ ID 1004 (GBS332) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 60 (lane 3; MW 50.7kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 67 (lane 4; MW 76kDa). 

15 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 312 

A DNA sequence (GBSx0341) was identified in S.agalactiae <SEQ ID 1007> which encodes the amino 
acid sequence <SEQ ID 1008>. This protein is predicted to be ribose-phosphate pyrophosphokinase (prsA). 
20 Analysis of this protein sequence reveals the following: 
Possible site: 22 

>>> Seems to have no N-terminal signal sequence 

Final Results 

25 bacterial cytoplasm Certainty=0. 3118 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9447> which encodes amino acid sequence <SEQ ID 9448> 
30 was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAA62181 GB:M92842 prs [Listeria monocytogenes] 
Identities = 209/312 (66%) , Positives = 266/312 (84%) , Gaps = 3/312 (0%) 

35 ' Query: 10 LKLFALSSNKELAKKVSQTIGIPLGQSTVRQFSDGEIQVNIEESIRGHHVFILQSTSSPV 69 

LK+F+L+SN+ELA+++++ +GI LG+S+V FSDGEIQ+NI EES I RG HV+++QSTS+PV 
Sbjct: 10 LKIFSLNSNRELAEEIAKEVGIELGKSSVTHFSDGEIQINIEESIRGCHVYVIQSTSNPV 69 

Query: 70 NDNLMEILIMVDALKRASAESVSVVMPYYGYARQDRKARSREPITSKLVANMLEVAGVDR 129 
40 N NLME+LIM+DALKRASA ++++VMPYYGYARQDRKARSREPIT+KLVAN++E AG R 

Sbjct: 70 NQNLMELLIMIDALKRASAATINIVMPYYGYARQDRKARSREPITAKLVANLIETAGATR 129 

Query: 130 LLTvDLHAAQICGFFDIPVDHLMGAPLIADYFDRO^LVGDDVVWSPDHGGVTRARKLAQ 189 
++T+D+HA QIQGFFDIP+DHL L++DYF + L GDD+ VWS PDHGGVTRARK+A 
45 Sbjct: 130 MITLDMHAPQIC^FFDIPIDHLNAvRLLSDYFSERHL-GDDLVWSPDHGGVTRARKMAD 188 

Query: 190 CLK?TPIAIIDK3^SWKMNTSEvMNIIGNIKjGKKCILIDDMIDTAGTICHARDALAEAGA 249 

LK PIAIIDKRR + N +EVMNI +GN++GK CI+IDD+IDTAGTI AA AL EAGA 
Sbjct: 189 RLKAPIAI IDKRR- - PRPNVAE VMNIVGNVEGKVCI I IDDI IDTAGTITLA&KALREAGA 246 

50 

Query: 250 TAvYASCTHPVLSGPALDNIQNSAIEKLIVLDTIYLPEERLIDKIEQISIAELIGEAIIR 309 

T VYA C+HPVLSGPA+ 1+ S IEKL+V ++I LPEE+ IDK+EQ+S+A L+GEAI+R 
Sbjct: 247 TKVYACCSHPvLSGPAMKRIEESPIEKLvVTNSIALPEEKWIDKMEQLSVAALLGEAIvR 306 

55 Query: 310 IHEKRPLSPLFE 321 

+HE +S LFE • - 
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Sbjct: 307 VHENASVSSLFE 318 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1009> which encodes the amino acid 
sequence <SEQ ID 1010>. Analysis of this protein sequence reveals the following: 

5 Possible site: 22 

>» Seems to have no N-terrainal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2685 (Affirmative) < suco 

10 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 298/322 (92%) , Positives = 311/322 (96%) 

15 

Query: 1 MEEIMSYSNLKLFALSSNKELAKKVSQTIGIPLGQSTVRQFSDGEIQVNIEESIRGHHVF 60 

+EE MSYS+LKLFALSSNKELA+KV+ +GI LG+STVRQFSDGE I QVNI EES IRGHHVF 
Sbjct: 1 LEEKMSYSDLKLFALSSNKEIiAEKVASAMGIQLGKSTVRQFSDGEIQVNIEESIRGHHVF 60 

20 Query: 61 ILQSTSSPVNDNLMEILIMVDALKRASAESVSWMPYYGYARQDRKARSREPITSKLVAN 120 

ILQSTSSPVNDNLMEILIMVDALKRASAE +SWMPYYGYARQDRKARSREPITSKLVAN 
Sbjct: 61 ILQSTSSPVNDNLMEILIMVDALKRASAEKISVVMPYYGYARQDRKARSREPITSKLVAN 120 

Query: 121 MLEVAGVDRLLTVDLHAAQIQGFFDIP\7DHLMGAPLIADYFDRQGLVGDDWWSPDHGG 180 
25 MLEVAGVDRLLTVDLHAAQIQGFFDIPVDHIiMGAPLrADYFDR GLVG+DWWSPDHGG 

Sbjct: 121 MLEVAGTORLLTVTDLHAAQICGFFDIPVDHLMGAPLIADYFDRHGLVGEDVVVVSPDHGG 180 

Query: 181 VTRARKLAQCLKTPIAIIDKOTSVTKMNTSEVMNIIGNIKGKKCILIDDMIDTAGTICHA. 240 
VTRARKLAQ L+TPIAIIDKRRSV KMNTSKVMNI IGN+ GKKC1LIDDMIDTAGTICHA 
30 Sbjct: 181 VTRARKLAQFLQTPIAIIDKRRSVDKMNTSEVMNIIGNVSGKKCILIDDMIDTAGTICHA 240 

Query: 241 ADAIiAEAGATAVYASCTHPVLSGPALDNIQNSAIEKLIVLDTIYLPEERLIDKIEQISIA 300 

ADALAEAGATAVYASCTHPVLSGPALDNIQ SAIEKLIVLDTIYLP+ERLIDKIEQISIA 
Sbjct: 241 ADALAEAGATAVYASCTHPVLSGPALDNIQRSAIEKLIVLDTIYLPKERLIDKIEQISIA 300 

35 

Query: 301 ELIGEAI IRIHEKRPLSPLFEM 322 

+L+ EAIIRIHEKRPLSPLFEM 
Sbjct: 301 DLVAEAI IRIHEKRPLSPLFEM 322 

40 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 313 

A DNA sequence (GBSx0342) was identified in S.agalactiae <SEQ ID 101 1> which encodes the amino 
acid sequence <SEQ ID 1012>. This protein is predicted to be a secreted protein. Analysis of this protein 
45 sequence reveals the following: 

Possible site: 20 

>» Seems to have no N-terminal signal sequence 

Final Results 

50 bacterial cytoplasm Certainty=0. 3751 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty^O . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ED 9277> which encodes amino acid sequence <SEQ ID 9278> 
55 was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 
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>GP:AAD00288 GB:U78607 putative secreted protein [Streptococcus mutans] 
Identities = 111/157 (70%) , Positives = 130/157 (82%) , Gaps = 1/157 (0%) 

Query: 1 MTAIKGQVGALESQQSELEAQNAQLEAVSQQLGQEIQTLSNKIVARNESLKKQVRSAQKG 60 
5 + I+GQV AL++QQ+EL+A+N +LEA S LGQ+IQTLS+KIVARNESLK+Q RSAQK 

Sbjct: 55 LITIQGQVSALQTQQAELQAENQRLEAQSATLGQQIQTLSSKIVARNESLKQQARSAQKS 114 

Query: 61 NL-TNYIOTILNSKSVSDAVITOWAIREWSANEKMIAQQEADKAALEAKQIENQNAINT 119 
N T+YIN I+NSKSVSDA+NRV AIREWSANEKML QQE DKAA+E KQ ENQ AINT 
10 Sbjct: 115 NAATSYINAIINSKSVSDAINRVSAIREWSANEKMLQQQEQDKAAVEQKQQENQAAINT 174 

Query: 120 VAANKQAIENNKAALATQRAQLEAAQLELSAQLTTVQ 156 

VAAN++ I N AL TQ+AQLEAAQL L A+LTT Q 
Sbjct: 175 VAANQETIAQNTNALNTQQAQI.EAAQLNLQAELTTAQ 211 

15 

There is also homology to SEQ ID 1014. 

A related GBS gene <SEQ ID 8543> and protein <SEQ ID 8544> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crenel: 3 
20 McG: Discrim Score: 8.29 

GvH: Signal Score (-7.5): 0.8 

Possible site: 49 
>>> Seems to have a cleavable N-term signal seq. 
ALOM program count : 0 value : 6.74 threshold : 0.0 
25 PERIPHERAL Likelihood = 6.74 400 

modified ALOM score: -1.85 

*** Reasoning Step: 3 

30 Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) 

35 The protein has homology with the following sequences in the databases: 

32.8/56.3% over 439aa 

Lactococcus lactis 

GP | 512521 | usp 45 Insert characterized 

PIR|ON0097 | JN0097 secreted 45K protein precursor - Insert characterized 

40 

ORF00094(301 - 1563 of 1941) 

GP|51252l|emb|CAA01320.l| |A17083 (1 - 440 of 461) usp 45 {Lactococcus 
lactis}PIR|ON0097| JN0097 secrete 
d 45K protein precursor - Lactococcus lactis 
45 %Match =16.5 

%Identity = 32.8 %Similarity = 56.3 

Matches = 141 Mismatches = 178 Conservative Sub.s = 101 

93 123 153 183 213 243 273 303 

50 RKyYNFKSNYTLFLFLF*FHYGVIILIE*IEEGYRFLDLIMVHLEIVDFKYKITNNDVI*FREFFGKIFNVLS*RSSLi™ 

I 

M 

333 387 417 447 477 507 537 

55 KKRILSAVLVSGVTLGTAA- -VTWADDFDSKIAATDSVINTLSGQQAAAQNQVTAIKGQVGALESQQSELEAQNAQLEA 

Ihhlhhl I I II I II =1 II 1= h= =1 II II =1 =h =h =|| |::|: 

KKKIISAILMSTVILSAAAPLSGVYAD-TNSDIAKQDATISSAQSAKAQAQAQVDSLQSKVDSLQQKQTSTKAQIAKIES 
20 30 40 50 60 70 80 

60 567 597 627 654 684 714 744 774 

VSQQLGQEIQTLSNKIVARNESLKKQWSAQ-KGNLTNYIOTIIJJSKSVSDAvNRWAIREWSANEKMLAQ 

:: I =1 11= I I = = |: 1 Mil = III- "Mil"! = =1 II I ||| :: | 

EAKALNAQIATIiNESIKERTKTLFAQARSAQVNSSATNYMDAVVNSKSLTDVIQKOTAIATV 

90 100 110 120 130 140 150 160 
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804 834 864 894 924 954 984 1014 



EAKQIENQNAINTVAANKQAIENNKAALATQRAQLEAAQL^ 



5 




170 180 190 200 210 220 230 240 



1044 1065 1095 1125 1155 1185 1215 

AQAEAKAQAESVA KAQAAAQVESATAPTETVQTQPRTEIKPSNLTATSSATTVATTTATATNEPKVTQPSVVTKA-- 



10 




250 260 270 280 290 300 310 320 



1266 1296 1326 1347 1374 1401 1455 



15 



-VEAPKAWSSTPRAVSKPWRSYDSSNTYPMGQCT WGA- KSMASWVGNYW-GNANQWGASARAAG- - YSVGTTPRV 




330 340 350 360 370 380 390 400 



20 



1503 1533 1563 1593 1623 1653 1683 

GAVAVWP YDGGGYGHVAWTSVANNSSIQVMESNYAGNMSIGNYRGSENPSASGSVYYIYPN**ILRRSFWSFLF 




410 420 430 440 450 460 



25 



SEQ ID 8544 (GBS65) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 5 (lane 6; MW 47.5kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 13 (lane 3; MW 72kDa) and in Figure 
175 (lane 2 & 3; MW 72kDa). 

30 The GBS65-GST fusion product was purified (Figure 102A; see also Figure 191, lane 4) and used to 
immunise mice (lane 1 product; 20ug/mouse). The resulting antiserum was used for Western blot (Figure 
102B), FACS, and in the in vivo passive protection assay (Table III). These tests confirm that the protein is 
immunoaccessible on GBS bacteria and that it is an effective protective immunogen. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
35 vaccines or diagnostics. 

Example 314 

A DNA sequence (GBSx0343) was identified in S.agalactiae <SEQ ID 1015> which encodes the amino 
acid sequence <SEQ ID 101 6>. Analysis of this protein sequence reveals the following: 

Possible site: 18 
40 »> Seems to have no N-terminal signal sequence 



No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
50 vaccines or diagnostics. 



Final Results 



45 



bacterial cytoplasm Certainty=0 . 1184 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



The protein has no significant homology with any sequences in the GENPEPT database. 
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Example 315 

A DNA sequence (GBSx0344) was identified in S.agalactiae <SEQ ID 1017> which encodes the amino 
acid sequence <SEQ ID 101 8>. Analysis of this protein sequence reveals the following: 

Possible site: 23 
5 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0. 4736 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

10 bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
15 vaccines or diagnostics. 

Example 316 

A DNA sequence (GBSx0345) was identified in S.agalactiae <SEQ ID 1019> which encodes the amino 

acid sequence <SEQ ID 1020>. This protein is predicted to be elongation factor Tu (tufA). Analysis of this 

protein sequence reveals the following: 

20 Possible site: 43 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 3012 (Affirmative) < suco 

25 bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9737> which encodes amino acid sequence <SEQ ID 9738> 
was also identified. 

30 The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB03851 GB:AP001507 translation elongation factor Tu (EF-Tu) 
[Bacillus halodurans] 
Identities = 302/397 (76%) , Positives = 350/397 (88%) , Gaps = 2/397 (0%) 

35 Query: 7 MAKEKYDRSKPHVNIGTIGHVDHGKTTI.^^ 66 

MAKEK+DRSK H NIGTIGHVDHGKTTLTAAITTVLA+R V Y +ID APEER 

Sbjct: 1 MAKEKFDRSKTHANIGTIGHVDHGKTTLTAAITTVLAKRSGKGVAMA--YDAIDGAPEER 58 

Query. 67 ERGITINTAHVEYETEKKHYAHIDAPGHADYVK^ITGAAQMDGAILVVASTDGPMPQTR 126 
40 " ' ERGITI+TAHVEYET+ RHYAH+D PGHAD YVKNM I TGAAQMDG ILW++ DGPMPQTR 

Sbjct: 59 ERGITISTAHWYETDNRHYAHVDCPGHADYVKNMITGAAQMDGGILWSAADGPMPQTR 118 

Query: 127 EHILLSRQVGVKHLIVFMNKVDLVDDEELLELVEMEIRDLLSEYDFPGDDLPVIQGSALK 186 

EHILLSRQVGV +L+VF+NK D+VDDEELLEIiVEME+RDLLSEYDFPGDD+PVI+GSALK 
Sbjct: 119 EHILLSRQVGVPYLWFLNKCD^WDDEELLELVEMEvRDLLSEYDFPGDDVPVIRGSALK 178 

Query: 187 ALEGDEKYEDIIMELMSTVDEYIPEPERDTDKPLLIjPVEDVFSITGRGTVASGRIDRGTV 246 

ALEGD ++E+ I+ELM+ VD+YIP PERDT+KP ++PVEDVFSITGRGTVA+GR++RG + 
Sbjct: 179 ALEGDAEWEEKIIELMAAVDDYIPTPERDTEKPFMMPVEDVFSITGRGTVATGRVERGQL 238 



50 



Query: 247 R VNDE VE I VG I KED I QKA WTGVEMFRKQLDEGLAGDNVG VLLRGVQRDE I ERGQ VLAKP 306 

V DEVEI+G++E+ +K VTGVEMFRK LD AGDN+G LLRGV R+E++RGQVLAKP 
Sbjct: 239 NVGDEWIIGLEEEAKKTTVTGVEMFlUaiiLDYAEAGDNIGAIiLRGVSREEVQRGQvLAKP 298 
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Query: 307 GS INPHTRFKGEVYI LSKEEGGRHTPFFNNYRPQFYFRTTDVTGS I ELPAGTEMVMPGDN 366 

G+I PHT FK EVY+LSKEEGGRHTPFF+NYRPQFYFRTTDVTG I+LP G EMVMPGDN 
Sbjct: 299 GTITPHTNFKAEVYVLSKEEGGRHTPFFSNYRPQFYFRTTDVTGIIQLPDGVEMVMPGDN 358 

Query: 367 VTIEVELIHPIAVEQGTTFSIREGGRTVGSGIVSEIE 403 

V + VELI PIA+E+GT FSIREGGRTVG+G+V+ 1+ 
Sbjct: 359 VEMTVELIAPIAIEEGTKFSIREGGRTVGAGWASIQ 395 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1021> which encodes the amino acid 
sequence <SEQ ID 1022>. Analysis of this protein sequence reveals the following: 

Possible site: 19 

>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1367 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 386/404 (95%) , Positives = 396/404 (97%) 



Query: 


1 


MEAFPKMAKEKTORSKPHVNIGTIGRA/DHGKTTLTAAITTvIARRLPTSvNQPKDYASID 


60 






+EAFPKMAKEKYDRSKPHVNIGTIGHVDHGKTTLTAAITTVLARRLP+SVNQPKDYASID 




Sb j ct : 


12 


LFAFPKMAKEKYDRSKPHTOIGTIGHVDHGKTTLTAAITTV1ARRLPSSVNQPKDYASID 


71 


Query: 


61 


AAPEERERGITINTAHVEYETEKRHYAHIDAPGHADYVKNMITGAAQMDGAILWASTDG 


120 






AAPEERERGITINTAHVEYET RHYAHIDAPGHADYVKNMITGAAQMDGAILWASTDG 




Sb j ct : 


72 


AAPEERERGI TINTAHVEYETATRHYAH I DAPGHADYVKNMITGAAQMDGAI LWASTDG 


131 


Query: 


121 


PMPQTREHILLSRQVGVKHLIVFMNKVDLVDDEELLELVEMEIRDLLSEYDFPGDDLPVI 


180 






PMPQTREHILLSRQVGVKHLIVFMNKVDLVDDEELLELVEMEIRDLLSEYDFPGDDLPVI 




Sb j ct : 


132 


PMPQTREHILLSRQVGVKHLIVFMNKVDLVDDEELLELVEMEIRDLLSEYDFPGDDLPVI 


191 


Query: 


181 


QGSALKALEGDEKYEDIIMELMSTVDEYIPEPERDTDKPLLLPVEDVFSITGRGTVASGR 


240 






QGSALKALEGD K+EDIIMELM TVD YI PEPERDTDKPLLLPVED VFS ITGRGTVASGR 




Sb j ct : 


192 


QGSALKALEGDTKFEDIIMELMDTVDSYIPEPERDTDKPLLLPVEDVFSITGRGTVASGR 


251 


Query: 


241 


IDRGTTOVNDEVEIVGIKEDIQKAVVTGVEMFRKQLDEGLAGDNVGVLLRGVQRDEIERG 


300 






IDRGTVRVNDE+EIVGIKE+ +KAWTGVEMFRKQLDEGLAGDNVG+LLRGVQRDEIERG 




Sb j ct : 


252 


IDRGTWVNDEIEIVGIKEETKKAVVTGVEMFRKQLDEGLAGDNVGILLRGVQRDEIERG 


311 


Query: 


301 


QV1AKPGSINPHTRFKGEVYILSKEEGGRHTPFFNNYRPQFYFRTTDVTGSIELPAGTEM 


360 






QV+AKP SINPHT+FKGEVYILSK+EGGRHTPFFNNYRPQFYFRTTDVTGSIELPAGTEM 




Sb j ct : 


312 


QVIAKPSSINPHTKFKGEVYILSKDEGGRHTPFFNNYRPQFYFRTTDVTGSIELPAGTEM 


371 


Query: 


361 


VMPGDNVTIEVELIHPIAVEQGTTFSIREGGRTVGSGIVSEIEA 404 








VMPGDNVTI VELIHPIAVEQGTTFSIREGGRTVGSGIVSEIEA 




Sbj Ct: 


372 


VMPGDNVTINVELIHPIAVEQGTTFSIREGGRTVGSGIVSEIEA 415 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 317 

A DNA sequence (GBSx0346) was identified in S.agalactiae <SEQ ID 1023> which encodes the amino 
acid sequence <SEQ ID 1024>. Analysis of this protein sequence reveals the following: 

Possible site: 36 

»> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -0.64 Transmembrane 90 - 106 ( 90 - 106) 

Final Results 

bacterial membrane Certainty=0 . 1256 (Affirmative) < suco 
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bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 318 

A DNA sequence (GBSx0347) was identified in S.agalactiae <SEQ ID 1025> which encodes the amino 
acid sequence <SEQ ID 1026>. This protein is predicted to be ftsW. Analysis of this protein sequence 
reveals the following: 

Possible site: 38 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-11.15 Transmembrane 44 - 60 ( 35 - 70) 
INTEGRAL Likelihood = -4.73 Transmembrane 76 - 92 ( 74 - 98) 
INTEGRAL Likelihood = -3.88 Transmembrane 117 - 133 ( 113 - 134) 



Final Results 

bacterial membrane Certainty=0 . 5458 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAB39929 GB:U58049 putative cell division protein ftsW 
[Enterococcus hirae] 
Identities = 78/159 (49%) , Positives = 107/159 (67%) , Gaps = 4/159 (2%) 

Query: 1 MANSXYAMSNGGWFGRGLGNSIEKLGYLPEATTDFVFSIVIEELGVIGAGFILALVFFLI 60 

M+NS YA+ NGG FGRG+GNSI K GYLPE+ TDF+FS++ EE G+IGA +L L+F L 
Sbjct: 240 MSNSYYALYNGGLFGRGMGNSITKKGYLPESETDFIFSVIAEEFGLIGALLVLFLLFLLC 299 



Query: 61 LRIMHVGIKAKDPFNSMIALGIGAMLLMQVFVNIGGISGLIPSTGVTFPFLSQGGNSLLV 120 

+RI K K+ ++I +G+G +L+Q +NIG I GLIP TGV PF+S GG S L+ 

Sbjct: 300 MRI FQKSTKQKNQQANL I LIGVGTWI LVQTS INIGS I LGLI PMTGVPLPFVSYGGTSYLI 359 

Query: 121 LSVAIGFVLNIDANEKKELIMKEAEEQYKPQEKNEKI IN 159 

LS AIG LNI + + KE +++ + QK K++N 
Sbjct: 360 LSFAIGLALNISSRQVKE KNKQVERLQLKKPKLLN 3 94 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1027> which encodes the amino acid 
sequence <SEQ ID 1028>. Analysis of this protein sequence reveals the following: 

Possible site: 51 
>» Seems to have no N-terminal signal sequence 



INTEGRAL 


Likelihood 




•10 


.93 


Transmembrane 


312 


- 328 


( 


303 


- 338) 


INTEGRAL 


Likelihood 




-8, 


.23 


Transmembrane 


22 


- 38 


( 


17 


- 47) 


INTEGRAL 


Likelihood 




-6. 


.85 


Transmembrane 


192 


- 208 


( 


187 


- 211) 


INTEGRAL 


Likelihood 




-5 


.10 


Transmembrane 


218 


- 234 


( 


212 


- 236) 


INTEGRAL 


Likelihood 




-4. 


,83 


Transmembrane 


86 


- 102 


( 


85 


- 107) 


INTEGRAL 


Likelihood 




-3. 


,72 


Transmembrane 


385 


- 401 


( 


383 


- 402) 


INTEGRAL 


Likelihood 




-3. 


,45 


Transmembrane 


61 


- 77 


( 


61 


- 79) 


INTEGRAL 


Likelihood 




-2. 


.39 


Transmembrane 


344 


- 360 


( 


344 


- 360) 



Final Results 

bacterial membrane Certainty=0. 5373 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the databases: 

>GP:CAB59721 GB:AJ250603 FtsW protein [Enterococcus faecium] 
Identities = 131/397 (32%) , Positives = 223/397 (55%) , Gaps = 23/397 (5%) 

Query: 15 KRHLIJSrYSIIiLPYLILSVIGLIMVYSTTSVSIiIQAHftNPFKSVINQGVFWIISLVAITFI 74 

KR +++ IL PYL LS+IGL+ VYS +S L+QA N ++ Q +F +S I 
Sbjct: 3 KRKKIDWWILGPYLTLSMIGLLEWSASSYRLLQADENTKSLLLRQLIFIFLSWGVIFLA 62 

Query: 75 YKLKLNFLTNTRVLTVVMLGEAFLLI IAR- - FFTTAIRGAHGWIVIGPVSFQPAEYLKI I 132 

+KL++L + ++ + F LI+ R F + GA WI + + FQP+E + 
Sbjct: 63 RSIKLHYLLHPKIAGYGIALSIFFLILTOVGIFGVTVNGAQRWISLFGIQFQPSEIiANLF 122 

Query: 133 IWWYLALTFAKIQKNISLYDYQALTRRKJWPTQWNDLRDWRVYSLLMVLLVAAQPDLGNA 192 

+++YL+ F p + +L+ + ++ + LL+ QP + A 

Sbjct: 123 LIFYLSWFFRDGNN PPK--NLKKPFLITVSITLLILFQPKIAGA 164 

Query: 193 SIIVLTAIIMFSISGIGYRWFSAILVMITGLSTVFLGTIAVIGVERVAKIP-VFGYVAKR 251 

+1+ A ++F + + ++ ++V + L G + +G + +P +F + +R 

Sbjct: 165 LMILSIAWVIFWAAAVPFKKGIYLIVTFSALLIGAA.GGVLYLGNK- -GWLPQMFNHAYER 222 

Query: 252 FSAFFNPFHDLTDSGHQLANSYYAMSNGGWFGQGLGNSIEKRGYLPEAQTDFVFSWIEE 311 

+ +PF D +G+Q+ +S+YA+ NGG +G+GLGNSI K+GYLPE +TDF+FS++ EE 
Sbjct: 223 IATLRDPFIDSHGAGYQMTHSFYALYNGGIWGRGLGNSITKKGYLPETETDFIFSIITEE 282 

Query: 312 LGLIGAGFILALVFFLILRIMOTGIKAKNPFNAMMALGVGGMMLMQVFVNIGGI SGLI PS 371 

LGLIGA +L L+F L +RI + + KN + LG G ++ +Q +N+G I+GL+P 
Sbjct: 283 LGLIGALCVLFLLFSLCMRIFCLSSRCKNQQAGLFLLGFGTLLFVQTIMNVGSIAGLMPM 342 

Query: 372 TGVTFPFLSQGGNSLLVLSVAVGFVLNIDASEKRDDI 4 08 

TGV PF+S GG S L+LS+ +G LNI + + +++ 
Sbjct: 343 TGVPLPFVSYGGTSYLILSLGIGITIJSriSSKIQAEEL 379 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 130/166 (78%) , Positives = 152/166 (91%) , Gaps = 2/166 (1%) 

Query: 1 MANSXYAMSNGGWFGRGLGNSIEKLGYLPEATTDFVFSIVIEELGVIGAGFILALVFFLI 60 

+ANS YAMSNGGWFG+GLGNSIEK GYLPEA TDFVFS+VIEELG+IGAGFILALVFFLI 
Sbjct: 269 LANSYYAMSNGGWFGQGLGNSIEKRGYLPEAQTDFVFSWIEELGLIGAGFILALVFFLI 328 

Query: 61 LRIMHVGIKAKDPFNSMIALGIGAMLLMQVFWIGGISGLIPSTGVTFPFLSQGGNSLLV 120 

LRIM+VGIKAK+PFN+M+ALG+G M+LMQVFVNIGGISGLIPSTGVTFPFLSQGGNSLLV 
Sbjct: 329 LRIMWGIKAKNPFNAMMALGVGGMMLMQVFVNIGGISGLIPSTGVTFPFIiSQGGNSLIiV 388 

Query: 121 LSVAIGFVLNIDANEKKELIMKEAEEQYK- - PQEKNEKIINLDAFK 164 

LSVA+GFVLNIDA+EK++ I KEAE Y+ +++N K++N+ F+ 
Sbjct: 389 LSVAVGFVLNIDASEKRDDIFKEAELSYRKDTRKENSKVVNIKQFQ 434 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 319 

A DNA sequence (GBSx0348) was identified in S.agalactiae <SEQ ID 1029> which encodes the amino 
acid sequence <SEQ ID 1030>. This protein is predicted to be probable cell division protein ftsw (ftsW). 
Analysis of this protein sequence reveals the following: 



Possible site: 34 

»> Seems to have an uncleavable N-term signal seq 
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Final Results 

bacterial membrane Certainty=0 .4906 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9327> which encodes amino acid sequence <SEQ ID 9328> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAA44490 GB:X62621 0RF2 N- terminal [Lactococcus lactis] 
Identities = 82/199 (41%) , Positives = 122/199 (61%) , Gaps = 9/199 (4%) 

MKIDKRHLLNYSILIPyLILSILGLIVIYSTTSATLIQLGANPFRSVINQGVFWAVSLVA 60 
M ++K + LNYSILIPYLIL+ +G+++I+STT +Q G NP++ VINQ F +S++ 
MNmKNNFI^SILIPYLIIAGIGIVMIFSTTvPDQLQKGLNPYKLVINQTAFVLLSIIM 60 

1 1 F I YKLKLNFLKNSKVLTMAVL VEVFLLL IARF FTQEVNGAHGWI VIGPI-SF 113 

I IY+LKL LKN K++ + +++ + L+ R T VNGA GWI 11 + 

IAVIYRLKLRALKNRKMIGIIMVILILSLIFCRIMPSSFALTAPVNGARGWIHIPGIGTV 120 

QPAEYLKVI IVWYLAFTFARRQKKIEIYDYOALTKGRWLPRSLSDLKDWRFYSLFMIGLV 173 
QPAE+ KV I+WYLA F+ +Q++IE D + KG+ L + L WR + ++ + 

QPAEFAKVFIIWYLASVFSTKQEEIEKNDINEIFKGKTLTQKL- -FGGWRLPWAILLVD 178 

IAQPDLGNGSI I VLTVI IM 192 
+ PDLGN II +IM 



There is also homology to SEQ ID 1028. 

A related GBS gene <SEQ ID 8545> and protein <SEQ ID 8546> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 6 

McG: Discrim Score: 15.18 

GvH: Signal Score (-7.5) : -3.58 
Possible site: 34 

»> Seems to have an uncleavable N-term signal seq 

ALOM program count: 5 value: -9.77 threshold: 0.0 

INTEGRAL Likelihood = -9.77 Transmembrane 12 - 28 ( 7 - 37) 
INTEGRAL Likelihood = -7.22 Transmembrane 76 - 92 ( 74 - 97) 
INTEGRAL Likelihood = -6.69 Transmembrane 210 - 226 ( 201 - 227) 
INTEGRAL Likelihood = -6.53 Transmembrane 182 - 198 ( 178 - 201) 
INTEGRAL Likelihood = -4.62 Transmembrane 51 - 67 ( 46 - 69) 
PERIPHERAL Likelihood = 1.32 116 
modified ALOM score: 2.45 

*** Reasoning Step: 3 



Query: 


1 


Sb j ct : 


1 


Query: 


61 


Sbjct: 


61 


Query: 


114 


Sbjct: 


121 


Query: 


174 


Sb j ct : 


179 



Final Results 

bacterial membrane Certainty^O .4906 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

ORF02700(301 - 876 of 1377) 

EGAD 1 8615 1 8419(1 - 197 of 198) hypothetical protein in rpmg 3 'region , fragment 
{Lactococcus lactis} SP| P27174 | YRG2_LACLA HYPOTHETICAL PROTEIN IN RPMG 3' REGION (0RF2) 
(FRAGMENT). GP 1 44069 | emb | CAA44490 . 1 1 |X62621 0RF2 N-terminal {Lactococcus lactis} 
PIR| PC1134 | PC1134 hypothetical protein 198 (rmpG 3' region) - Lactococcus lactis (fragment) 
%Match =15.1 

%Identity =42.3 %Similarity =64.9 

Matches = 82 Mismatches = 64 Conservative Sub.s = 44 
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87 117 147 177 207 237 267 297 

KA*I*Y*I**L*LVILFLLPFFINFL*IYLTGIOT*IWPSNISN*SFIFVISIVGGYXX*LIXXXIMHNGNFLKY*RK*Y 



327 357 387 417 447 477 507 537 

NMKIDKRHLI^SILIPYLILSILGLIVIYSTTSATLIQLGJU^PFRSVINQGVFWAVSLVAIIFIYKlKmFLKNSrottT 




10 20 30 40 50 60 70 



567 585 
MAVLVEVFLLLIARF 



609 636 666 696 726 756 

FT--QEVNGAHGWIVIGPI-SFQPAEYLKVIIWYLAFTFARRQKKIEIYDYQALTKGRWL 



IIMVILILSLIFCRIMPSSFALTAPTOGARGWIHIPGIGTVQPAEFAKVFIIWYLASVFSTKQEEIEKNDINEIFKGKTL 
90 100 110 120 130 140 150 



786 816 846 876 906 936 966 996 

PRSLSDLKDWRFYSLFMIGLVIAQPDLGNGSIIVLTVIIMYCISGIGYRWFSALLGLIWGSTLFIGTIAWGVETMAKV 



TQKL- - FGGWRLPWAILLVDLIMPDLGNTMI IGAVALIMI 
170 180 190 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 320 

A DNA sequence (GBSx0349) was identified in S.agalactiae <SEQ ID 1031> which encodes the amino 
acid sequence <SEQ ID 1032>. Analysis of this protein sequence reveals the following: 

Possible site: 22 

>>> Seems to have no N- terminal signal sequence 



The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1033> which encodes the amino acid 
sequence <SEQ ID 1034>. Analysis of this protein sequence reveals the following: 

Possible site: 54 

»> Seems to have no N-terminal signal sequence 



Final Results 



bacterial cytoplasm Certainty=0 . 3665 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



Final Results 



bacterial cytoplasm Certainty=0. 2373 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below: 



Identities = 35/41 (85%) , Positives = 37/41 (89%) 



Query: 1 MEKEAKQIIDLKRNLFKIDVRAQKDEEKVFMRTACCYSPFY 41 

+EKEAKQ+ IDLKRNLFKID VRAQKDEEKVFMRTAC S Y 
Sbjct: 1 LEKEAKQMIDLKRNLFKIDVRAQKDEEKVFMRTACRQSRVY 41 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 321 

A DNA sequence (GBSx0351) was identified in S.agalactiae <SEQ ID 1037> which encodes the amino 
acid sequence <SEQ ID 1038>. Analysis of this protein sequence reveals the following: 

Possible site: 49 

■>» Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -1.65 Transmembrane 78 - 94 ( 78 - 95) 
INTEGRAL Likelihood = -1.33 Transmembrane 421 - 437 ( 420 - 437) 

Final Results 

bacterial membrane Certainty=0. 1659 (Affirmative) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAA00827 GB:A09073 phosphoenol pyruvate carboxylase 
[Corynebacterium glutamicum] 
Identities = 335/958 (34%) , Positives = 539/958 (55%) , Gaps = 80/958 (8%) 

Query: 22 EIITEEVGLLKQLLDEATQKLIGSESFDKIE--KIVSLSLTD DYTGLKETISALSNE 76 

+ + +++ L Q+L E + G E ++ +E ++ S + + L + ++ 

Sbjct: 3 DFLRDDIRFLGQILGEVIAEQEGQEVYELVEOARLTSFDIAKGNAEMDSLVQVFDGITPA 62 

Query: 77 BMVI VSRYFS I LPLLINI SEDVDLAYE I NYKNNLNQDYLGKLST TIDW 125 

+ ++R FS LL N++ED+ Y L + L T T+D 

Sbjct: 63 KATPIARAFSHFALLANLAEDL YDEELREQALDAGDTPPDSTLDATWLKLNEG 115 

Query: 126 -AGHENAKDILEHVNWPVLTAHPTO^JRKTvl^TSKI 180 

G E D+L + V PVLTAHPT+ +R+TV + I +R+ +++ 
Sbjct: 116 NVGAEAVADVLRNAEVAPVLTAHPTETRRRTVPD 175 

Query: 181 - -EKWYADLRRYIGI IMQTDTIREKKLron<NEITNVJffiYYNRSLIKAVTKLTAEYKALAA 238 

++ ++RR I 1+ QT IR + ++++EI + YY SL++ + ++ + 
Sbjct: 176 KLDEIEKNIRRRITILWQTALIRVARPRIEDEIEVGLRYYKLSLLEEIPRINRDVAVELR 235 

Query: 239 KK GIHLENPKPLTM-GMWIGGDRDGNPFVTAETLRLSAMVQSEVIINHYIEQLNELY 294 

4+ G+ L KP+ G WIGGD DGNP+VTAET+ S +E ++ +Y QL+ h 
Sbjct: 236 ERFGEGVPL KP VWPGSWIGGDHDGNPYTOAETVEYSTHRAAETVLBCYYARQLHSLE 292 

Query: 295 RNMSLSINLTEVSPELVTIANQSQDNSVYRENEPYRKAFNFIQDKLVQTLLNLKVGSSPK 354 

+SLS + +V+P+L-H LA+ ++ R +EPYR+A + +4- +++ T 
Sbjct: 293 HELSLSDR^INKOTPQLIJy^AGHNDVPSRvI)EPYRRAVHGVRGRILAT 341 

Query: 355 EKFVSRQESSDIVGRYIKSHIAQVASDIQTEELPAYATAEEFKQDLLLVKQSLVQYGQDS 414 

+++++G + + YA+ EEF D L + SL + 

Sbjct: 342 TAELIGE DAVEGVWFKVFTPYASPEEFLNDALTIDHSLRESKDVL 386 

Query: 415 LVnGELACLIQAVDIFGFYLATIDMRQDSSINEACTOELLKSANIVDDYSSLSEEEKCQL 474 

+ D L+ LI A++ FGF L +D+RQ+S E + EL + A + +Y LSE EK ++ 
Sbjct: 387 IADDRLSvLISAIESFGFMjYALDLRQNSESYEDVLTELFERAQVTANYRELSEAEKLEV 446 

Query: 475 LLKELTEDPRTLSSTHAPKSELLQKELAIFQTARELKDQLGEDI INQHI ISHTES VSDMF 534 

LLKEL + SE+ +F5L IF+TA E + G ++ IIS SV+D+ 

Sbjct: 447 LLKELRSPRPLIPHGSDEYSEVTDRELGIFRTASEAVKKFGPRMVPHCIISMASSVTDVL 506 

Query: 535 ELAIMLKEVGLIDAN QARIQIVPLFETIEDLIOTSRDIMTQYLHYELVKKWIATNNN 590 

E ++LKE GLI AN + + ++PLFETIEDL 1+ + +L + ++ +N 

Sbjct: 507 EP^fVLLKEFGLIAANGDNPRGTVDVIPLFETIEDI^GAGILDELWKIDLYRNYLLQRDN 566 

Query: 591 YQEIMLGYSDSNKDGGYLSSGWTLYKAQNELTKIGEENGIKITFFHGRGGTVGRGGGPSY 650 

QE+MLGYSDSNKDGGY S+ W LY A+ +L +4- G+K+ FHGRGGTVGRGGGPSY 
Sbjct: 567 VQEVMLGYSDSNKI3GGYFSANWALYDAELQLVELC3JSAGVKLRLFHGRGGTVGRGGGPSY 626 

Query: 651 EAITSQPFGSIKDRIRLTEQGEIIENKYGNQnAAYYI!ttEMLISASIDR^^VTRMITNPNEI 710 
+AI +QP G+4-+ +R+TEQGEII KYGN + A NLE L+SA+++ +• + +E+ 
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Sbjct: 627 DAIIiAQPRGAVQGSVRITEQQEIISAKyGNPETARRNLEALVSATLE ASLLDVSEL 682 

Query: 711 DNFRETMDGIVSESNAV YRNLVFDNPYFYDYFFEASPIKEVSSLNIGSRPAARKTI 766 

+ + D I+SE + + Y +LV ++ F USF +++P++E4- SLNIGSRP++RK 
Sbjct: 683 TDHQRAYD-IMSEISELSIiKKYASIjVHEDQGFIDYFTQSTPLQEIGSLNIGSRPSSRKQT 741 

Query: 767 TEISGLRAIPWVFSWSQNRIMFPGWYGVGSAFKHFI EQDEANL&KLQTMYQKWPFFN 823 

+ + LRAIPWV SWSQ+R+M PGW+GVG+A + +1 EQ +A+LQT+ + WPFF 
Sbjct: 742 SSVEDLRAIPWVLSWSQSRVMLPGWFGVGTALEQWIGEGEQATQRIAELQTLNESWPFFT 801 

Query: 824 SLLS^nOTm,SKSN™IALQYAQIAGSKEVRD-V^ 882 

S+L N+ V+SK+ + +A YA L EV + V+++I E4- LTK M I D+LL+ 
Sbjct: 802 SVLD1OTAQVMSKAELRLAKLYADLIPDTEVAERVYSVIREEYFLTKKMFCVITGSDDLLD 861 

Query: 883 ENPMLHASLDYRLPYFNVLNYVQIELIKRLRSNQLDEDYEECLIHITINGIATGLRNSG 940 

+NP+L S+ R PY LN +Q+E+++R R E + I +T+NG++T LRNSG 

Sbjct: 862 DNPLLARSVQRRYPYIjLPLNVIQVEMMRRYRKGDQSEQVSRNIQLTMNGLSTALRNSG 919 

A related GBS nucleic acid sequence <SEQ ID 10961> which encodes amino acid sequence <SEQ 
10962> was also identified. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1039> which encodes the amino 
sequence <SEQ ID 1040>. Analysis of this protein sequence reveals the following: 
Possible site: 40 

■»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 1613 (Affirmative) < succ> 
bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 
bacterial outside — Certainty=0. 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 659/927 (71%) , Positives = 779/927 (83%) , Gaps = 11/927 (1%) 



Query: 


14 


KLESSSNKEIITEEVGLLKQLbDEATQKLIGSESFDKIEKIVSLSLTDDYTGLKETISAL 


73 






KLESS+N++1I EEV LLK++L+ T+++IG ++F IE 1+ LS DY L++ ++ + 




Sbjct: 


5 


KLESSNNQDIIAEEVALLKEMLENITRRt^IGDDAFTVIESI^WLSEKQDYIELEKVVANI 


64 


Query: 


74 


SNEE^WIVSRYFSILPLLINISEDvDLAYEINYKNNIJNQDYLGKLSTTIDVVAGHENAKD 


133 






SN+EM ++SRYFSILPLLINISEDVDIAYEINY+NN + DYLGKL+ TI +AG +N KD 




Sbjct: 


65 


SNQEMEVISRYFSILPLLINISEDVDLAYEINYQNNTDTDYLGKLALTIKDLAGKDNGKD 


124 


Query: 


134 


ILEHVNWPVLTAHPTQVQRKTvLELTSKIHDLDRKYRDVmGIVNQEKWYADLRRYI 


193 






ILE VNWPVLTAHPTQVQRKT+ I1EI1T+ IH LLRKYRD KAG++N EKW +L RYI + 




Sbjct: 


125 


ILEQVNWPVLTAHPTQVQRKTILELTTHIHKLLRKYRDAKAGVINLEKWRQELYRYIEM 


184 


Query: 


194 


IMQTDTIREKKL3CVKNEITNVMEYY1TOSLIKAVTKLTAEYKAIAAKKGIHLENPKPLTMG 


253 






IMQTD IREKKL+VKNEI NVM4YY+ SLI+AVTKLT EYK LA K G+ L+NPKP+TMG 




Sbjct: 


185 


IMQTDIIREKKLQVKNEIK3WMQYYDGSLIQAVTKLTTEYKNLAQKHGLELDNPKPITMG 


244 


Query: 


254 


MWIGGDRDGNPFVTAETLRLSAMVQSEVIINHYIEQI^LYRNMSLSINLTEVSPELVTL 


313 






MWIGGDRDGNPFVTAETL LSA VQSEVI+N+YI++I. LYR SLS L + + E+ L 




Sbjct: 


245 


MWIGGDRDGNPFVTAETLCLSATVQSEVirjm'IDELAaiiYRTFSLSSTLVQPNSEVERL 


304 


Query: 


314 


ANQSQDNSVYRFJffiPYRKAFNFIQDKTjVQTLLNLKVGSSPKEKFVSRQESSDIVGRYIKS 


373 






A+ SQD S+YR NEPYR+AF++IQ +LQT + I1 ++SS+ S 




Sbjct: 


305 


ASLSQDQSIYRGNEPYRRAFHYIQSRIjKQTQIQI.T NQPAASMSSSVGLNTSAWS 


358 


Query: 


374 


HIAQVASDIQTEELPAYATAEEFKQDLLLVKQSLVQYGQDSLVDGELACLIQAVDIFGFY 


433 






A + + I AY + +FK DL ++QSL+ G +L++G+L ++QAVDIFGF+ 




Sbjct: 


359 


SPASLENPIL AYDSPVDFKADLKAIEQSIiIiDNGNSALI EGDLREVMQAVDI FGFF 


413 


Query: 


434 


^TIDMRQDSSINEACVAELLKSANIVDDYSSLSEEEKCQLliLKELTEDPRTLSSTHAPK 


493 



LA+IDMRQDSS+ EACVAELLK ANIVDDYSSLSE EKC +LL++L E+PRTLSS K 
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Sbjct: 414 LASIDMRQDSSVQEACVAELLKGANIVDDYSSLSETEKCDVLLQQLMEEPRTLSSAAVAK 473 

Query: 494 SELLQKELAIFQTARELKDQLGEDIINQHIISHTESVSDMFEIiAIMLKEVGLIDANQARI 553 

S+LL+KELAI+ TARELKD+LGE++I QHIISHTESVSDMFELAIMLKEVGL+D +AR+ 
Sbjct: 474 SDLLEKELAIYTTARELKDKLGEEVIKQHIISHTESVSDMFELAIMLKEVGLVDQQRARV 533 

Query: 554 QIVPLFETIEDLDNSRDIMTQYLHYELVKKWIATNNNYQEIMLGYSDSNKDGGYLSSGWT 613 

QIVPLFETIEDLDN+RDIM YL +++VK WIATN NYQEIMLGYSDSNKDGGYL+SGWT 
Sbjct: 534 QIVPLFETIEDLDNARDIMAAYLSHDIVKSWIATNRNYQEIMLGYSDSNKDGGYLASGWT 593 

Query: 614 LYKAQNELTKIGEENGIKITFFHGRGGTVGRGGGPSYFAITSQPFGSIKDRIRLTEQGEI 673 

LYKAQNELT IGEE+G+KITFFHGRGGTVGRGGGPSY+AITSQPFGSIKDRIRLTEQGEI 
Sbjct: 594 LYKAQNELTAIGEEHGVKITFFHGRGGTVGRGGGPSYDAITSQPFGSIKDRIRLTEQGEI 653 

15 Query: 674 IENKYGNQDAAYYNLEMLISASIDRMVTRMITNPNEIDNFRETMDGIVSESNAVYRNLVF 733 

IENKYGN+D AYY+LEMLISASI+RMVT+MIT+PNEID+FRE MD IV++SN +YR LVF 
Sbjct: 654 IENKYGNKDVAYYHLEMLISASINRMVTQMITDPNEIDSFREIMDSIVADSNIIYRKLVF 713 

Query: 734 DNPYFYDYFFEASPIKEVSSLNIGSRPAARKTITEISGLRAIPWVFSWSQNRIMFPGWYG 793 
20 DNP+FYDYFFEASPIKEVSSLNIGSRPAARKTITEI+GLRAIPWVFSWSQNRIMFPGWYG 

Sbjct: 714 DNPHFYDYFFEASPIKEVSSLNIGSRPAARKTITEITGLRAIPWVFSWSQNRIMFPGWYG 773 

Query: 794 VGSAFKHFIEQDEANIAKLQTMYQKWPFFNSLLSNVD^WLSKSN^MIALQYAQLAGSKEV 853 
VGSAFK +I++ + NL +LQ MYQ WPFF+SLLSNVDMVLSKSNMNIA QYAQLA ++V 
25 Sbjct: 774 VGSAFKRYIDRAQGNLERLQHMYQTWPFFHSLLSiSIVDMVLSKSNMNIAFQYAQIiAERQDV 833 

Query: 854 RDVFNIILNEWQLTKDMIIAIEQHDNLLEENPMLHASLDYRLPYFNVLMYVQIELIKRLR 913 

RDVF IL+EWQLTK++ILAI+ HD+LLE+NP L SL RLPYFNVLNY+QIELIKR R 
Sbjct: 834 RDVFYEILDEWQLTKWIIAIQDHDDLLEDNPSLKHSLKSRLPYFNVIJSIYIQIELIKRWR 893 

30 

Query: 914 SNQLDEDYEKLIHITINGIATGLRNSG 940 

+NQLDE+ EKLIH TINGIATGLRNSG 
Sbjct: 894 NNQLDENDEKLIHTTINGIATGLRNSG 920 

35 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 322 

A DNA sequence (GBSx0352) was identified in S.agalactiae <SEQ ID 1041> which encodes the amino 
acid sequence <SEQ ID 1042>. This protein is predicted to be Bacillus licheniformis Pz-peptidase 
40 homologue (pepF). Analysis of this protein sequence reveals the following: 

Possible site: 61 

»> Seems to have no N-terminal signal sequence 

Final Results 

45 bacterial cytoplasm Certainty=0 .3012 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1043> which encodes the amino acid 
50 sequence <SEQ ID 1044>. Analysis of this protein sequence reveals the following: 

Possible site: 55 

>>> Seems to have no N-terminal signal sequence 

Final Results 

55 bacterial cytoplasm Certainty=0. 3137 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below: 
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Identities = 512/593 (86%) , Positives = 564/593 (94%) 

Query: 1 MKLKKRSEFPENELWJLTALYKDRQDFLLAIEKALEDIKVFKKOTEGKmCVEDFTSALM 60 

M+LKKRSEFPENELWDLTALYKDRQDFLLAIEKAL+DI +FK+NYEG+L V+DFT AL+ 
Sbjct: 26 MELKKRSEFPENELWDLTALYKDRQDFLLAIEKftLQDIDIjFKRNYEGRLTSVDDFTQALI 85 

Query: 61 EIEHIYIQMSHIDTYAFMPQTTDFSlffiEFAQISQAGSDFATKANVLLSFFNTALaNADIK 120 

EIEHIYIQMSHI TYAFMPQTTDFS+E FAQI+QAG DF TKA+V LSFF+TALANAD+ 
Sbjct: 86 EIEHIYIQMSHIGTYAFMPQTTDFSDESFAQIAQAGDDFMTKASVALSFFDTALANADLD 145 

Query: 121 ILDSLENNPHFKATIRQAKIQKQHLLSPEVEKALTNI^ 180 

+LD+LE NP+F A IR AKIQK+HLLSP+VEKAIi NL EV+N PYDIYTKMRAGDFDM+D 
Sbjct: 146 VIiDTLEKNPYFSAAIRMAKIQKEHLLSPDvEKALftNIjREVINAPYDIYTKMRAGDFDMDD 205 

15 Query: 181 FEVDGKTYKNSFvTYENYFQNHENAEIREKSFRSFSKGLRKHQNAAAAAYLAKVKSEKLI 240 

FE VDGKTYKNS FV+YEN+ +QNHENAE IREK+ FRS FSKGLRKHQN AAAAYLAKVKSEKL+ 
Sbjct: 206 FE VDGKTYKNS FVS YENFYQNHENAE IRE KAFRS FS KGLRKHQNTAAAAYLAKVKSEKLIi 265 

Query: 241 ADMRGYDSVFDYLLSEQEVDRSMFDRQIDLIMDEFGPVAQRFLKHIADVNGIEKMTFADW 300 
20 ADM+GY SVFDYLL+EQEVDRS+FDRQIDLIM EFGPVAQ+FLKH+A VNG+EKMTFADW 

Sbjct: 266 ADMKGYASVFDYLLAEQEVDRSLFDRQIDLIMTEFGPVAQKFLKHVAQVNGLEKMTFADW 325 

Query: 301 KLDIDNELNPEVSINDAYDLVMKSVAPLGKEYSQEVERYQKERWVDFAANANKDSGGYAA 360 
KLDIDN+LNPEVS I + AYDLVMKS+APLG+EY++E+ERYQ ERWVDFAANANKDSGGYAA 
25 Sbjct: 326 KLDIDM3LNPEVSIDGAYDLVMKSIA.PLGQEYTKEIERYQTERWVDFAANANKDSGGYAA 385 

Query: 361 DPYKVHPYVI,MSOTGRMSDWTLIHEIGHSGQFIFSDNHQSFFOTHMSTYYVEAPSTFNE 420 

DPYKOTPYVLMSWTGRMSDWTLIHEIGHSGQFIFSDNHQS+FNTHMSTYYVEAPSTFNE 
Sbjct: 386 DPYKVHPYVIiMSWTGRMSDvYTLIHEIGHSGQFIFSDNHQSYFNTHMSTYYVEAPSTFNE 445 

30 

Query: 421 LLLSDYLENQFDTARQKRFALAHRLTDTYFHNFITHLLEAAFQRKVYTLIEEGGTFGAEQ 480 

L+LSDYLE+QFD RQKRFALAHRLTDTYFHNFITHLLEAAFQRKVYTLIEEGGTFGA+Q 
Sbjct: 446 LMLSDYLEHQFDDPRQKRFALAHRLTDTYFHNFITHLLEAAFQRKVYTLIEEGGTFGADQ 505 

35 Query: 481 IiNAIrtKEvLTQFWGDAIEIDDDAALTWMRQAHYYMGLYSYTYSAGLVISTAGYLNLKNNP 540 

LNA+MKEVLT FWGDA+ + IDDDAALTWMRQAHYYMGLYSYTYSAGLVI STAGYLNLK+NP 
Sbjct: 506 I^AmKEVLTDFWGDAVDIDDDAALTWMRQAHYYMGLYSYTYSAGLVISTAGYraLKHNP 565 

Query: 541 NGAKEWLAFLKSGGSRTPLETALLISADISTDKPLRDTINFLSNTVDQIINYS 593 
40 NGAKEWL FLKSGGSRTPL+TA+LI ADI+T+KPLRDTI FLS+TVDQI I+Y+ 

Sbjct: 566 NGAKEWLDFLKSGGSRTPLDTAMLIGADIATEKPLRDTIQFLSDTVDQIISYT 618 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

45 Example 323 

A DNA sequence (GBSx0353) was identified in S.agalactiae <SEQ ID 1045> which encodes the amino 
acid sequence <SEQ ID 1046>. Analysis of this protein sequence reveals the following: 



50 



55 



60 



Possible site: 19 

»> May be a lipoprotein 

Final Results 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1047> which encodes the amino acid 
sequence <SEQ ID 1048>. Analysis of this protein sequence reveals the following: 

Possible site: 19 
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»> May be a lipoprotein 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

5 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
An alignment of the GAS and GBS proteins is shown below: 

10 Identities = 72/127 (56%) , Positives = 85/127 (66%) 

Query: 1 MKKYIKLFLLTVFATTLVACGQPSTSNKTTTSSTLEVGKVELWKEDTNVLSEKWYHKG 60 

+ K K L + A LVAC Q + +TT S V LWKEDTN + EKV + KG 

Sbjct: 1 WKRFKTGFLALVAMLLVACSQGTKQIQTTPSVPKADHHVRLWKEDTNTVDEKVSFGKG 60 

15 

Query: 61 DTVLDVLKANYKVKEKDGFITSIDGISQDETKGLY^FKVNNKIAPKAANQIKVKKNDKI 120 

DTVL+VLK NY+VKEKDGFIT+IDGI QD YW+FKVN K+A K A+QI VK D I 

Sbjct: 61 DTVLEVLKDNyEVKEKDGFITAIDGIEQDTKANKYWLFKVNGKMADKGADQITVKDGDSI 120 

20 Query: 121 EFYQEVY 127 

EFYQEV+ 
Sbjct: 121 EFYQEVF 127 

SEQ ID 1046 (GBS 185) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
25 extract is shown in Figure 28 (lane 6; MW 15.7kDa). 

GBS185-His was purified as shown in Figure 199, lane 8. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 324 

30 A DNA sequence (GBSx0354) was identified in S.agalactiae <SEQ ID 1049> which encodes the amino 
acid sequence <SEQ ID 1050>. Analysis of this protein sequence reveals the following: 

Possible site: 28 

>» Seems to have a cleavable N-term signal seq. 



35 



INTEGRAL 


Likelihood = 


-4. 


.46 


Transmembrane 


75 


- 91 


( 67 


- 94) 


INTEGRAL 


Likelihood = 


-4. 


.41 


Transmembrane 


33 


- 49 


( 30 


- 49) 


INTEGRAL 


Likelihood = 


-2 


.60 


Transmembrane 


53 


- 69 


( 52 


- 70) 


INTEGRAL 


Likelihood = 


-1 


.38 


Transmembrane 


108 


- 124 


( 106 


- 124) 


INTEGRAL 


Likelihood = 


-0. 


.06 


Transmembrane 


149 


- 165 


( 149 


- 165) 



40 Final Results 

bacterial membrane Certainty=0 .2784 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

45 A related GBS nucleic acid sequence <SEQ ID 973 1> which encodes amino acid sequence <SEQ ID 9732> 
was also identified. A further related GBS nucleic acid sequence <SEQ ID 10929> which encodes amino 
acid sequence <SEQ ID 10930> was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1051> which encodes the amino acid 

50 sequence <SEQ ID 1052>. Analysis of this protein sequence reveals the following: 

Possible site: 48 
»> Seems to have a cleavable N-term signal seq. 
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INTEGRAL Likelihood = -7.96 Transmembrane 50 - 66 ( 49 - 71) 

INTEGRAL Likelihood = -5.73 Transmembrane 101 - 117 ( 99 - 124) 

INTEGRAL Likelihood = -4.41 Transmembrane 141 - 157 ( 139 - 159) 

INTEGRAL Likelihood = -4.25 Transmembrane 73 - 89 ( 67 - 92) 

Final Results 

bacterial membrane Certainty=0. 4185 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
An alignment of the GAS and GBS proteins is shown below: 

Identities = 82/163 (50%) , Positives = 120/163 (73%) , Gaps = 3/163 (1%) 

Query: 10 LTRVAI LSALCWLRYAFAPLPNIQP ITAI FLITWLFDLKEGVATVTI TMLVS S FLMGF 69 

++R+AI+SALCWLR F+ LPN+QP+TA L ++FLEV + + + +S+FL+GF 
Sbjct: 6 MSRIAIMSALCVVLRMVFSSLPNVQPVTAFLLSYLLYFGLAKAVLVMMLCLFLSAFLLGF 65 

Query: 70 GPWVFLQIISFTLILCLWKFLIYPLTKAVCFGKITEWLQTFFAGGLGWYGVIIDTCFA 129 

GPWVF Q+ F L+L LW+F++YPL++ F K ++ Q F G++YGV+IDTCFA 
Sbjct: 66 GPWVFWQVTCFVLVLLLWRFVLYPLSQQ--FPKY-QLGCQAFLVALCGLLYGVLIDTCFA 122 

Query: 130 WLYHMPWWTYVLAGLSFNMAHALSTCLFYPLLLP I LRRFRNEK 172 

+LY MPWW+YVLAG+ FN+AHALST +F+P+++ + RR E+ 
Sbjct: 123 YLYSMPWWSYVLAGMPFNI AHALSTL VFF PWMML FRRLIGEQ 165 

A related GBS gene <SEQ ID 8549> and protein <SEQ ID 8550> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 10 

McG: Discrim Score: 6.79 

GvH: Signal Score (-7.5) : -0.91 
Possible site: 28 

>» Seems to have a cleavable N-term signal seq. 

ALOM program count: 3 value: -4.46 threshold: 0.0 

INTEGRAL Likelihood = -4.46 Transmembrane 35 - 51 ( 29 - 54) 
INTEGRAL Likelihood = -1.38 Transmembrane 68 - 84 ( 66 - 84) 
INTEGRAL Likelihood = -0.06 Transmembrane 109 - 125 ( 109 - 125) 
PERIPHERAL Likelihood =7.53 88 
modified ALOM score: 1.39 

*** Reasoning Step: 3 



Final Results 

bacterial membrane Certainty=0 .2784 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

ORF01220(421 - 552 of 1002) 

GP| 9950155 |gb|AAG07353 . 1 1 AE004814_8 | AE004814 (16 - 56 of 69) hypothetical protein 
{Pseudomonas aeruginosa} 
%Match =3.2 

%Identity =39.5 %Similarity =60.5 

Matches = 17 Mismatches = 15 Conservative Sub.s = 9 

222 252 282 312 342 372 402 432 

STLTKLTRVAILSALCTAnjRYAFAPLPNIQPITAIFLITvVLFDLKEGVATVTITMLVSSFLMGFGPWVFLQIISFTLIL 

|::: 

MDPELFEEWMMTGLVTVLI 
10 



462 492 522 552 582 612 642 672 

CLWKFLIYPLTKAVCFGKITEWLQTFFAGGLGWYGVI IDTCFAV&YHMPWWTYVLAGLSFNMAHALSTCLFYPLLLPI 
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= l = = : I I II = = l III Mil I II 

LFMAFIVWDLAKKSKAGKFGTLIL- - FFALGLGV- LGFI IKGLVIGSLEGAGM 
30 40 50 60 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 325 

A DNA sequence (GBSx0355) was identified in S.agalactiae <SEQ ID 1053> which encodes the amino 
acid sequence <SEQ ID 1054>. This protein is predicted to be endolysin. Analysis of this protein sequence 
10 reveals the following: 

Possible site: 28 

»> Seems to have a cleavable M-term signal seq. 

Final Results 

15 bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

20 >GP:CAA72266 GB:Y11477 endolysin [Bacteriophage Bastille] 

Identities = 64/210 (30%) , Positives = 95/210 (44%) , Gaps = 15/210 (7%) 

Query: 66 KPIIDVSGWQLPKEIDYDTLSKNISGWIRVFGGSKISKTNNAAYTTGIDKSFKTHIKEF 125 
K I+D+S +ID+DT +S + R G + + +N +D+ +KT + 

25 Sbjct: 12 KTIVDISHHNA- -DIDFDTAKNYVSMFIARTGDGHRIfN- -SNGELQGWDRKYKTFVANM 67 

Query: 126 QKRNIPVAWSYALGSSWEMKEEAQIFYKNAAPYKPTFYWIDVEEETMSNMNKGVQAFR 185 

+ R IP Y + S V K+EA+ F+N T+DETNM + +QF 

Sbjct: 68 KARGIPFGNYMFNRFSGVASAKQEAEFFW-NYGDKDATVWVCDAEVSTAPNMKECIQVFI 126 

30 

Query: 186 KELKRLGAKNVGI YIGTYFMTEQGI SVKGFDAVWI PTYGSDSGYYEAAPQTELKYDLHQY 245 

LK LGAK VG+YIG + EG D WIP YG+ + DL Q+ 

Sbjct: 127 DRLKELGAKKVGLYIGHHKYQEFGGKDVNCDFTWIPRYGNKPAF ACDLWQW 177 

35 Query: 246 TSQGYLPGFNQPLDLNQIAVNKDKKKTYEK 275 

T G + G + D+N + +K EK 
Sbjct: 178 TEYGNIAGIGK-CDINVLYGDKPMSFFTEK 206 

A related DNA sequence was identified in S. pyogenes <SEQ ID 1055> which encodes the amino acid 
40 sequence <SEQ ID 1056>. Analysis of this protein sequence reveals the following: 

Possible site: 31 

>» Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-16.98 Transmembrane 8 - 24 ( 3 - 28) 

45 Final Results 

bacterial membrane Certainty=0. 7793 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

50 An alignment of the GAS and GBS proteins is shown below: 

Identities = 198/278 (71%) , Positives = 235/278 (84%) 

Query: 1 MRRRIKPIWAVFFSLFGLLLI IGHIiHSTNTLKKELVEAKKTIPSvKASKVPQKSTSSKD 60 
MRR+IKPIW VFF L ++LIIG + + +KE+ +AK IP ++ K+++S+ 
55 Sbjct: 1 MRRKIKPIVVLVFFILLAMVLIIGKRQANHAKQKEVEDAKSHIPIATSNPGKAKTSTSET 60 



Query: 61 KEFVLKPIIDVSGWQLPKEIDYDTLSKNISGWIRWGGSKISKTNNAAYTTGIDKSFICr 120 
++F+L PI+DVSGWQLP+EIDYDTLS++ISG ++RV+GGS+I+ NNAA+TTGIDKSFKT 
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Sbjct: 61 EDFILNPIVDVSGWQLPEEIDYDTLSRHISGAIVRVYGGSQITAHNNAAFTTGIDKSPKT 120 

Query: 121 HIKEFQKRNIPVAVYSYALGSSVKEMICEEAQIFyKNAAPYKPTFYWIDVEEETMSffl®IKG 180 

H I KEFQKRN+ PVAVYS YALG S KEMKEEA+ FYKNAAPY PT+YWIDVEE TM +MNKG 
Sbjct: 121 HIKEFQKRIWPVAVYSYALGRSTKEMKEEARAFYKNAAPYNPTYYWIDVEFATMKDMNKG 180 

Query: 181 VQAFRKELKRLGAKNVGIYIGTYFMTEQGISVKGFDAVWIPTYGSDSGYYEAAPQTELKY 240 

V AFR+ELK+LGA+NVG+YIGTYFM EQ IS KGFD+VWI PTYGSDSGYYEAAP T L Y 
Sbjct: 181 OTAFREELKKLGAENVGLYIGTYFMAEQDISTKGFDSVWIPTYGSDSGYYEAAPNTTljDY 240 

Query: 241 DLHQYTSQGYLPGFNQPLDLNQIAVNKDKKKTYEKLFG 278 

DLHQYTSQGYL GFN LDLNQIAV KD KKT+EKLFG 
Sbjct: 241 DLHQYTSQGYLSGFNNALDLNQIAVTKDTKKTFEKLFG 278 

A related GBS gene <SEQ ID 855 1> and protein <SEQ ID 8552> were also identified. Analysis of 
protein sequence reveals the following: 

Lipop Possible site: -1 Crend: 5 
McG: Discrim Score: 13.20 
GvH: Signal Score (-7.5): -0.72 

Possible site: 28 
>>> Seems to have a cleavable N-term signal seq. 
ALOM program count: 0 value: 7.05 threshold: 0.0 
PERIPHERAL Likelihood = 7.05 196 
modified ALOM score: -1.91 

*** Reasoning Step: 3 



Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

32.4/47.3% over 194aa 

Bacteriophage 

Bastille 

GP| 1865711 | endolysin Insert characterized 
ORF01218(496 - 1125 of 1446) 

GP | 1865711 | emb|CAA72266.l| |Y11477 (12 - 206 of 364) endolysin {Bacteriophage Bastille} 
%Match =7.9 

%Identity =32.3 %Similarity =47.3 

Matches = 65 Mismatches = 100 Conservative Sub.s = 30 

315 345 375 405 435 465 495 525 

VTISimRRIKPIWAVFFSLFGLLLIIGHLHSTNTLKOTLvEAKKTIPSVKASKVPQKSTSSKDKEFVLKPIIDVSGWQ 

=1 I 1=1=1 = 

MALEANKYPKEKTIVDIS- -H 
10 

555 585 615 645 675 705 735 765 

LPKEIDYDTLSKNISGWIRVFGGSKISKTNNAAYTTGIDKSFKTHIKEFQKRNIPVAVYSYALGSSVKEMKEEAQIFYK 

=11=11 =11 =1 I =1 =1= =11 = = I II I = II l=l|= |= 

HNADIDFDT-AKNOTSMFIARTGDGHRYNSN-GELQGVVDRKYKTFVAN^ 

30 40 50 60 70 80 90 

795 825 855 885 915 945 975 1005 

NAAPYKPTFYWIDVEEETMSlSMNKGVQAFRKELKRLGAKNVGIYIGTYFMTEQGISVKGFDAvWIPTYGSDSGYYFAA 

i i = i i i ii = =i i ii mi 11 = 111 =11 I III II = 

NYGDKDATVWVCDAEVSTAPNMKECIQVFIDRLKELGAKKVGLYIGHHKYQEFGGKDVNCDFTWIPRYG NK 

110 120 130 140 150 160 



1035 1065 1095 1125 1155 1185 1215 1245 

TELKyDLHQYTSQGYLPGXNQPLDLNQIAVNKDKKKTYEKLFGKVKE*KLLLTVAFLINYLLFNSSIERIFW 



WO 02/34771 



PCT/GB01/04789 



-414- 

PAFACDLWQWTEYGNIAGIGK-CDINVLYGDKPMSFFTEKEGAKETLVPALl^^ 

180 190 200 210 220 230 240 

SEQ ID 8552 (GBS206) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
5 extract is shown in Figure 51 (lane 6; MW 31.7kDa). 

GBS206-His was purified as shown in Figure 206, lane 6. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 326 

10 A DNA sequence (GBSx0356) was identified in S.agalactiae <SEQ ID 1057> which encodes the amino 
acid sequence <SEQ ID 1058>. Analysis of this protein sequence reveals the following: 



15 



20 



35 



Possible site: 41 

»> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -1.44 Transmembrane 183 - 199 ( 183 - 200) 

Final Results 

bacterial membrane Certainty=0 . 1574 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



A related GBS nucleic acid sequence <SEQ ID 9729> which encodes amino acid sequence <SEQ ID 9730> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAG20117 GB:AE005090 NADH dehydrogenase/oxidoreductase-like 
25 protein; NolA [Halobacterium sp. NRC-1] 

Identities = 38/156 (24%) , Positives = 83/156 (52%) , Gaps = 13/156 (8%) 

Query: 19 TMEILIAGGSGFLGKQIIKAALTKGHKVAYLSRHEGKGDIFKDPRLTYIRGDITFADKIH 78 
+M++L+ GG+GF+G + + +GH V +R + D +T I GD+T + + 

30 Sbjct: 8 SMDVLOTGGTGFIGTHLCRELDDRGHDVTAFAREPADAALPAD--VTRIVGDVTVKETVA 65 

Query: 79 LEDRTFDILIDCIGA IKPNQLD ELNVKATQKAVALCHKNQIPKLVYISA 127 

D +++ + KP+ D ++++ T+ VA + + ++ +SA 

Sbjct: 66 NAIDGHDAWNLVALSPLFKPSGGDSRHLDVHLGGTENWAAASEAGVEYILQLSALDAD 125 



Query: 128 NSGYSAYIKSKRKAEQIIKASGLDYLFVRPGLMYGE 163 

+G +AY+++K +AE+ +++S L + VRP +++G+ 
Sbjct: 126 PTGPTAYLRAKGRAEEATOSSDLHHTIVRPSWFGD 161 



40 No corresponding DNA sequence was identified in S.pyogenes. 

A related GBS gene <SEQ ID 8553> and protein <SEQ ID 8554> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop Possible site: -1 Crend: 5 
McG: Discrim Score: -7.99 
45 GvH: Signal Score (-7.5): -6.34 

Possible site: 41 
»> Seems to have no N-terminal signal sequence 
ALOM program count: 1 value: -1.44 threshold: 0.0 

INTEGRAL Likelihood = -1.44 Transmembrane 183 - 199 ( 183 - 200) 
50 PERIPHERAL Likelihood =4.29 20 

modified ALOM score: 0.79 



*** Reasoning Step: 3 
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Final Results 

bacterial membrane -- 
bacterial outside -- 
bacterial cytoplasm — 

RGD motif 68-70 



Certainty=0 . 1574 (Affirmative) < suco 
Certainty=0. 0000 (Not Clear) < suco 
Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

32.5/54.4% over 274aa 

Schizosaccharomyces 

pombe 

GP | 3395590 | hypothetical protein Insert characterized 

PIR|T41177 |T41177 hypothetical protein SPCC1840.09 - fission yeast Insert characterized 
ORF01216{358 - 990 of 1272) 

GP|3395590|emb|CAA20132.l| |AL031179(1 - 275 of 276) hypothetical protein 
{Schizosaccharomyces pombe} PIR|T41177 | T41177 hypothetical protein SPCC1840.09 - fission 
yeast (Schizosaccharomyces pombe) 
%Match =7.3 

%Identity =32.4 %Similarity =54.3 

Matches = 71 Mismatches = 88 Conservative Sub.s = 48 

144 174 204 234 264 294 324 354 

*L**ISTDS*K*A*IPFQGIMIINIATVLFGMI^*KFYK*IiNMKCPDVMT*NHTvVRY*TITLTRHIKISILNLQNEGEG 

384 414 444 474 504 534 564 

TMEILIAGGSGFLGKQIIKAALTKGHKVAYLSRHEGKGDIFKDPRLTYIRGDITEADKIHLEDRTFDILIDCIGAI 



MKI VITLGGSGFLGHNICKLAIAKGYEWSVSRRGAGGLHNKEPWMDDV^ 

10 20 30 40 50 60 70 

585 615 648 678 

KPNQLDELNVKATQKAV - - -ALCHKNQIPKLVYIS 

|| : : I I :|: : | :| |s| 

ILMENNYKKILQNPRGPVSHLINSLSSISMFKTGQNPLiAPKPEEAKQSKNKOT 

90 100 110 120 130 140 150 

699 726 753 783 810 840 846 876 

ANS- - -GYSA-YIKSKRKAE-QIIKASGLDYLFVRPGLMYG-EERPLSIFQAKCIKLFSHL PFLGIWQKVF 

h: I ll|:||:|l =1111 =1 = 111 = 11 =ll = = 1=1= III = = 

AHAAAPGLDPRYIKTKREAEREISKISNLRSIFLRPGF^TYNFNDRPFTGAIASLFTVSSSINRATSGALNFLGTASAEPL 
170 180 190 200 210 220 230 

930 960 990 1020 1050 1080 1110 



PTK- VVIVA-EAIWTLRKKPTQKILSIEEIiNNK*FIKKATVNSSFYSFTFPKSFS*VFFLSLIjTAI*FKSSG*LXPGR* 

l== I = I III I I = I == = =1 I =1= 

PSEEVALAALEAISDPSVKGPVE-ISELKSMAHK-FKQKSL 
250 260 270 

SEQ ID 8554 (GBS303) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 51 (lane 5; MW 28.3kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 55 (lane 5; MW 53.2kDa). 

The GBS303-GST fusion product was purified (Figure 207, lane 6) and used to immunise mice. The 
resulting antiserum was used for FACS (Figure 275), which confirmed that the protein is immunoaccessible 
on GBS bacteria. 

Based on tins analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 327 

A DNA sequence (GBSx0357) was identified in S.agalactiae <SEQ ID 1059> which encodes the amino 
acid sequence <SEQ ID 1060>. Analysis of this protein sequence reveals the following: 

Possible site: 49 
5 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2850 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC36853 GB:L23802 regulatory protein [Enterococcus faecalis] 
Identities = 61/164 (37%) , Positives = 96/164 (58%) , Gaps = 13/164 (7%) 

15 

Query: 1 MSKKNKIKKTLVDQILDKAKIEH DSLQLDALQGDLPNGIQKQDIFKTLALI 51 

M+KK +KT +++++ K+ + D L +++ L GI+K IFKTL + 

Sbjct: 1 MAKKKTQQKTNAMRMVEQHKVPYKEYEFAWSEDHLSAESVAESL--GIEKGRIFKTLVTV 58 

20 Query: 52 GDKTGPIIGILPLTEHLSEKKLAKISGNKKVQMIPQKDLQKITGYIHGANNPIGIRQKHN 111 

G+KTGP++ ++P + L KKLAK SGNKKV+M+ KDL+ TGYI G +P G+ K 
Sbjct: 59 GNKTGPWAVIPGNQELDLKKLAKASGNKKA7EMLHLKDLEATTGYIRGGCSPTGM--KKQ 116 

Query: 112 YPIFIDTIALEKQELIVSAGEIGRSIRINSEVLADFVNAKFADI 155 
25 +P ++ A + +IVSAG+ G I + E + N +FA+I 

Sbjct: 117 FPTYLAEEAQQYSAIIVSAGKRGMQIEIAPEAILSLTNGQFAEI 160 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1061> which encodes the amino acid 
sequence <SEQ ID 1062>. Analysis of this protein sequence reveals the following: 

30 Possible site: 30 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2651 (Affirmative) < suco 

35 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 114/157 (72%) , Positives = 139/157 (87%) 

40 

Query: 1 MSKKNKIKKTLVDQILDKAKIEHDSLQLDALQGDLPNGIQKQDIFKTLALIGDKTGPIIG 60 

M+KK K+KKTLV+QILDKA I H L+L+AL+GD P+ +Q DI+KTLAL GD+TGP+IG 
Sbjct: 1 MAKKTKLKKTLVEQILDKANIAHQGLKLNALEGDFPDDLQPSDIYKTLALTGDQTGPLIG 60 

45 Query: 61 ILPLTEHLSEKKLAKISGNKKVQMIPQKDLQKITGYIHGANNPIGIRQKHNYPIFIDTIA 120 

I+PLTEHLSEK+LAK+SGNKKV M+PQKDLQK TGYIHGANNP+GIRQKH+YPIFID A 
Sbjct: 61 IIPLTEHLSEKQLAKVSGNKKVS^IVPQKDLQKTTGYIHGANNPVGIRQKHSYPIFIDQTA 120 

Query: 121 LEKQELIVSAGEIGRSIRINSEVLADFVNAKFADIKE 157 
50 LEK ++IVSAGE+GRSI+I+S+ LADFV A FAD+K+ 

Sbjct: 121 LEKGQIIVSAGEVGRSIKISSQALADFVGASFADLKK 157 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

55 Example 328 

A DNA sequence (GBSx0358) was identified in S.agalactiae <SEQ ID 1063> which encodes the amino 
acid sequence <SEQ ID 1064>. Analysis of this protein sequence reveals the following: 
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Possible site: 28 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4719 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 8555> which encodes amino acid sequence <SEQ ID 8556> 
was also identified. This protein belongs to the glycolysis/gluconeogenesis pathway, and such proteins have 
been experimentally detected as surface-exposed in Streptococci. The protein has homology with the 
following sequences in the GENPEPT database: 

>GP:AAD36444 GB-.AE001791 phosphoglycerate mutase [Thermotoga maritima] 
Identities = 65/191 (34%) , Positives = 93/191 (48%) , Gaps = 13/191 (6%) 



Query: 


5 


MKFYLVRHGKTQWNLEGRFQGANGDSPLLEEAIEELEELGQYLSSIHFDAVYSSDLGRAR 64 






MK YL+RHG+T WN +G +QG D PL E E+ +L L + DA+YSS L R+ 




Sb j ct : 


1 


MKLYLIRHGETIWNEKGLWQGVT-DVPLNERGREQARKLANSLKRV--DAIYSSPLKRSL 


57 


Query: 


65 


DTVNILNDANSCPKEIHYTPQLREWALGTLEGCKIATMQAIYPRQMTAFYQNPLQFKHDM 


124 






+T + A KEI LRE + G + YP + + +P M 




Sbjct: 


58 


ETAEEI - -ARRFEKEIIVEEDLRECEISLWNGLTVEEAIREYPVEFKKWSSDP- — NFGM 


112 


Query: 


125 


FGAESLYQTTHRVESFLRSIASK NYDKVLIVGHGANLTASIRSLLGYQYGSLHYKD 


180 






G ES+ +RV + + S+ + V+IV H +L A I +LG LH 




Sb j ct : 


113 


EGLESMRNVQNRWKAIMKIVSQEKLNGSENWIVSHSLSLRAFICWILGLPL-YLHRNF 


171 


Query: 


181 


KLDNASLTI IE 191 








KLDNASL+++E 




Sbjct: 


172 


KLDNASLSWE 182 





A related DNA sequence was identified in S.pyogenes <SEQ ID 1065> which encodes the amino acid 
sequence <SEQ ID 1066>. Analysis of this protein sequence reveals the following: 
Possible site: 24 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3628 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 127/205 (61%) , Positives = 152/205 (73%) 



Query: 5 MKFYLVRHGKTQWNLEGRFQGANGDSPLLEEAIEELEELGQYLSSIHFDAVYSSDLGRAR 64 

MK Y VRHGKT WNLEGRFQGA GDSPLLEEA +E+ LG+ L+ + FDAVY+SDL RA 
Sbjct: 1 MKLYFVRHGKTLWNLEGRFQGAGGDSPLLEEAKDEIHLLGKELAKVAFDAVYTSDLQRAM 60 

Query: 65 DTVNILNDANSCPKEIHYTPQLREWALGTLEGCKIATMQAIYPRQMTAFYQNPLQFKHDM 124 

T 1+ DA ++++T QLREW LG LEG KIATM AIYP+QM AF +N QFK D 

Sbjct: 61 ATAAIILDAFDQQPKLYHTDQLREWRLGKLEGAKIATMAAIYPQQMLAFRENIAQFKPDQ 120 

Query: 125 FGAESLYQTTHRVESFLRSLASKNYDKVLIVGHGANLTASIRSLLGYQYGSLHYKDKLDN 184 

F AES+YQTT RV ++S K+Y VLIVGHGANLTA+IRSLLG++ L K LDN 
Sbjct: 121 FEAESIYQTTQRVCHLIQSFKDKHYQNVLIVGHGANLTATIRSLLGFEPALLLAKGGLDN 180 



Query: 
Sb j ct : 



185 ASLTI IETHDFKDFNCLTWNDKSYL 209 

ASLTI+ET D+ ++CL WNDKS+L 
181 ASLTILETKDYLTYDCLIWNDKSFL 205 
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SEQ ID 8556 (GBS314) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 51 (lane 4; MW 27.2kDa), in Figure 169 (lane 15-17; MW 41.6kDa) and in 
Figure 239 (lane 4; MW 41.6kDa). It was also expressed in E.coli as a GST-fusion product. SDS-PAGE 
analysis of total cell extract is shown in Figure 55 (lane 4; MW 52.1kDa). 

5 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 329 

A DNA sequence (GBSx0359) was identified in S.agalactiae <SEQ ID 1067> which encodes the amino 
acid sequence <SEQ ID 1068>. Analysis of this protein sequence reveals the following: 

10 Possible site: 56 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3014 (Affirmative) < suco 

15 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB12562 GB:Z99108 similar to hypothetical proteins [Bacillus subtilis] 
20 Identities = 69/232 (29%) , Positives = 108/232 (45%) , Gaps = 9/232 (3%) 



25 



Query: 4 SIvFDVDDTIYDQQAPYRIAVEKCFPDFDMSAINQAYIRFRHySDIGFPRVMAGEWTTEY 63 

+++FDVDDTI D QA +A+ F D ++ N +++ + + G+ T + 

Sbjct: 6 TLLFDVDDTILDFQAAEALALRIiIiFEDQNIPLTNDMKAQYKTINQGLWRAFEEGKMTRDE 65 

Query: 64 FRFWRCKETLLEFGYREIDEATGIYFQEIYEHELENITMLDEMRMTLDFLKSKNVPMGII 123 

R L E+GY EA G ++ Y LE L + L + + 1+ 

Sbjct: 66 WNTRFSALLKEYGY EADGALLEQKYRRFLEEGHQLIDGAFDLISNLQQQFDLYIV 121 

30 Query: 124 TNGPTEHQLKKVKKLGLYDYVDPKRVIVSQATGFQKPEKEIFNLAAEQF-DMNPSTTLYV 182 

TNG + Q K+++ GL+ + K + VS+ TGFQKP KE FN E+ + TL + 
Sbjct: 122 TNGVSHTQYKRLRDSGLFPFF- -KDIFVSEDTGFQKPMKEYFNYVFERIPQFSAEHTL1I 179 

Query: 183 GDSYDNDIMGAFNGGWHSMWFNHRGRSLKPGIKPvYDVAIDNFEQLFGAVKV 234 
35 GDS DIG G+WN + PIPY+I E+L+ + + 

Sbjct: 180 GDSLTADIKGGQLAGLDTCWMNPDMKPNVPEIIPTYE- -IRKLEELYHILNI 229 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1069> which encodes the amino acid 
sequence <SEQ ID 1070>. Analysis of this protein sequence reveals the following: 

40 Possible site: 56 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3216 (Affirmative) < suco 

45 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 276/300 (92%) , Positives = 292/300 (97%) 

50 

Query: 1 MITSIVFDVDDTIYDQQAPYRIAVEKCFPDFDMSAINQAYIRFRHYSDIGFPR'WJAGEWT 60 

MIT+IVFDVDDTIYDQQAPYRIA+EKCFPDFDMS +NQAYIRFRHYSD+GFPRVMAGEWT 
Sbjct: 1 MITAIVFDVDDTIYDQQAPYRIAMEKCFPDFDMSVMNQAYIRFRHYSDVGFPRVMAGEWT 60 

55 Query: 61 TEYFRFWRCKETLLEFGYREIDEATGIYFQEIYEHELENITMLDEMRMTLDFLKSKNVPM 120 

TEYFRFWRCKETLLEFGYREIDEA G++FQE+YEHELENITMLDEMRMTLDFLKSKNVPM 
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10 



40 



45 



50 



Sbjct: 


61 


Query: 


121 


Sb j ct : 


121 


Query: 


181 


Sbjct: 


181 


Query: 


241 


Sbjct: 


241 
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GIITOGPTEHQLKKV+KLGLYDY+D KRVIVSQATGFQKPEKEIFNLAAEQFDMNP TTIi 



YVGDSYDNDIMGAFNGGWHSMWFNHRGR LKPG KPVYDVAIDNFEQLFGAVKVLFDLPD 



NKFIFD+NDK NP+L+MG+NNGLMMAAERLLESNMS+DKWILLRLT +QEKVLR+KYAR 



15 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 330 

A DNA sequence (GBSx0360) was identified in S.agalactiae <SEQ ID 1071> which encodes the amino 

acid sequence <SEQ ID 1072>. Analysis of this protein sequence reveals the following: 

20 Possible site: 34 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2451 (Affirmative) < suco 

25 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9727> which encodes amino acid sequence <SEQ ID 9728> 
was also identified. 

30 The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB11858 GB:Z99104 lysyl-tRNA synthetase [Bacillus subtilis] 
Identities = 318/490 (64%), Positives = 390/490 (78%), Gaps = 1/490 (0%) 

2uery: 44 EEIiNDQQIVRREKMAALTEQGIDPFGKRFERTATSGQLNEKYADKSKEDLHDIEETATIA 103 
35 EELNDQ VRR+KM L + GIDPFG RFERT S ++ YD +KE+L + TIA 



60 



Query: 


44 


Sbjct: 


9 


Query: 


104 


Sbj ct : 


69 


Query: 


164 


Sbjct: 


129 


Query: 


224 


Sbjct: 


189 


Query: 


284 


Sbjct: 


249 


Query: 


344 


Sbjct: 


309 


Query: 


404 


Sbjct: 


368 



GR+MTKRGKGK GFAH+QD EGQIQIYVRKDSVG++ YEIFK +DLGD +GV G+V +T+ 



+GELS+KAT L+KALRPLP+K+HGL D+E YR+R+LDLI N DS F+TRSKII 



+RR++D +G+LEVETP +H+ GGASARPFITHHNA DI + +RIA ELHLKRLIVGG+ 



E+VYEIGR+FRNEG+ HNPEFT IE Y+AYADY+DIM LTE ++ H+ + V G 



55 Y +1 + +KR+HMVDAVKE TG+DFW+E+T+E+A+ A+E V + K TVGHII 



N FFE+ +E+TLIQPTF++GHPVE+SPLAKKN DPRFTDRFELFI+ +E+ANAFTELND 
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Query: 464 PIDQLSRFEAQASAKELGDDEATGVDYDYVEALEYGMPPTGGLGIGIDRLCMLLTDTTTI 523 

PIDQ RFEAQ +E G+DEA +D D+VEALEYGMPPTGGLGIGIDRL ML.LT+ +1 
Sbjct: 428 PIDQRERPEAQLKEREAGNDEAHLMDEDFVEALEYGMPPTGGLGIGIDRLVMLLTNAPSI 487 

5 

Query: 524 RDVLLFPTMK 533 

RDVLLFP M+ 
Sbjct: 488 RDVLLFPQMR 497 

10 A related DNA sequence was identified in S.pyogenes <SEQ ID 1073> which encodes the amino acid 
sequence <SEQ ID 1074>. Analysis of this protein sequence reveals the following: 

Possible site: 45 

>» Seems to have no N-terminal signal sequence 

15 Final Results 

bacterial cytoplasm — Certainty=0 .4694 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

20 An alignment of the GAS and GBS proteins is shown below: 

Identities = 439/500 (87%) , Positives = 474/500 (94%) 

Query: 34 LEEIMSNQHIEEIJroQQIvRREKMAALTEQGIDPFGKRFERTATSGQLNEKYADKSKEDL 93 
LEE MSNQHIEELNDQQIVRREKM AL EQGIDPFGKRF+RTA S +L EKYADK+KE+L 
25 Sbjct: 1 LEENMSNQHIEELNDQQIVRREKMTALAEQGIDPFGKRFDRTANSAELKEKYADKTKEEL 60 

Query: 94 HDIEETATIAGRLMTKRGKGKVGFAHIQDREGQIQIYVRKDSVGEENYEIFKKADLGDFL 153 

H++ ETA +AGRLMTKRGKGKVGFAH+QDREGQIQ+YVRKDSVGE+NYEIFKKADLGDF+ 
Sbjct: 61 HELNETAIVAGRLMTKRGKGKVGFAHLQDREGQIQLYTOKDSVGEDNYEIFKKADLGDFI 120 

30 

Query: 154 GVEGQVMRTDMGELSIKATHITHLSKALRPLPEKFHGLTDIETIYRKRHLDLISNRDSFD 213 

GVEG+VMRTDMGELSIKAT +THLSK+LRPLPEKFHGLTDIETIYRKRHLDLISNR+SFD 
Sbjct: 121 GVEGEVMRTDMGELS1KATKLTHLSKSDRPLPEKFHGLTDIETIYRKRHLDLISNRESFD 180 

35 Query: 214 RFVTRSKIISEIRRFMDSNGFLEVETPVLHNEAGGASARPFITHHNAQDIDMVLRIATEL 273 

RFVTRSK+ISEIRR++D FLE VETPVLHNEAGGA+ARPF+THHNAQ+ IDMVLRIATEL 
Sbjct: 181 RFVTRSKMISEIRRYLDGLDFLEVETPVl,HNEAGGAAARPFVTHHNAQNIDMVLRIATEL 240 

Query: 274 HLKRLIVGGMERVYEIGRIFRNEGMDATHNPEFTSIEAYQAYADYQDIMDLTEGIIQHVT 333 
40 HLKRLI VGGMER VYE IGRI FRNEGMDATHNPEFTS IE YQAYADY DIM+LTEGI IQH 

Sbjct: 241 HLKRLIVGGMERVYEIGRIFRNEGMDATHNPEFTSIEVYQAYADYLDIMNLTEGIIQHAA 300 

Query: 334 KTVKGDGP INYQGTE I KINEPFKRVHMVDAVKE I TG I DFWKEMTLEEAQALAQEKNVPLE 393 
K V+GDGP I + YQGTEI + INEPFKRVHMVDA+ KE+TG DFW EMT+EEA ALA+EK VPLE 
45 Sbjct: 301 KATOGDGPIDYCGTEIRINEPFKRVHMVDAIKEVTGADFWPEMTVEEAIALAKEKQVPLE 360 

Query: 394 KHFTTVGHIINAFFEEFVEDTLIQPTFVFGHPVEVSPLAKKNDTDPRFTDRFELFIMTKE 453 

KHF +VGHI INAFFEEFVE+TL+QPTFVFGHPVEVSPLAKKN D RFTDRFELFIMTKE 
Sbjct: 361 KHFISVGHIINAFFEEFVEETLVQPTFVFGHPVEVSPLAKKNPEDTRFTDRFELFIMTKE 420 

50 

Query: 454 YANAFTELNDPIDQLSRFEAQASAKELGDDEATGVDYDYVEALEYGMPPTGGLGIGIDRL 513 

YANAFTELNDPIDQLSRFEAQA AKELGDDEATG+DYD+VEALEYGMPPTGGLGIGIDRIi 
Sbjct: 421 YANAFTELNDPIDQLSRFEAQAQAKELGDDEATGIDYDFVEALEYGMPPTGGLGIGIDRL 480 

55 Query: 514 CMLLTDTTTIRDVLLFPTMK 533 

CMLLT+TTTIRDVLLFPTMK 
Sbjct: 481 CMLLTNTTTIRDVLLFPTMK 500 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
60 vaccines or diagnostics. 
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Example 331 

A DNA sequence (GBSx0361) was identified in S.agalactiae <SEQ ID 1075> which encodes the amino 
acid sequence <SEQ ID 1076>. This protein is predicted to be 6,7-dimethyl-8-ribityllumazine synthase 
(ribH). Analysis of this protein sequence reveals the following: 

5 Possible site: 34 

>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1042 (Affirmative) < suco 

10 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB14257 GB:Z99116 riboflavin synthase (beta subunit) [Bacillus subtilis] 
15 Identities = 103/151 (68%) , Positives = 120/151 (79%) 

Query: 1 MTIIEGQLVANEMKIGI WSRFNELITSKLLSGAVDGLLRHGVSEEDIDI VWVPGAFEIP 60 

M II+G LV +KIGIW RFN+ ITSKLLSGA D LLRHGV DID+ WVPGAFEIP 
Sbjct: 1 milC^NLVGTGLKIGIWGRFNDFITSKLLSGAEDALLRHGVDTNDIDVAWVPGAFEIP 60 

20 

Query: 61 YMARKMALYKDYDAIICLGWIKGSTDHYDWCNEVTKGIGHLNSQSD1PHIFGVLTTDN 120 

+ A+KMA K YDAII LG Vl+G+T HYDYVCHE KGI + + +P IFG++TT+N 
Sbjct: 61 FAAKKMAETKKYDAIITLGTVlRGATTHyDYVCNEAAKGlAQAANTTGVPVIFGIVTTEN 120 

25 Query: 121 IEQAIERAGTKAGNKGYDCALSAIEMVNLDK 151 

IEQAIERAGTKAGNKG DCA+SAIEM NL+ + 
Sbjct: 121 IEQAIERAGTKAGNKGVDCAVSAIEMANIiNR 151 

No corresponding DNA sequence was identified in S.pyogenes. 

30 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 332 

A DNA sequence (GBSx0362) was identified in S.agalactiae <SEQ ID 1077> which encodes the amino 
acid sequence <SEQ ID 1078>. This protein is predicted to be GTP cyclohydrolase ii (ribA/B). Analysis of 
35 this protein sequence reveals the following: 

Possible site: 20 

»> Seems to have no N-terminal signal sequence 

Final Results 

40 bacterial cytoplasm Certainty=0 . 1918 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9725> which encodes amino acid sequence <SEQ ID 9726> 
45 was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAA86524 GB:U27202 GTP cyclohydrase II/ 

3 , 4-dihydroxy-2-butanone-4-phosphate synthase 
[Actinobacillus pleuropneumoniae] 
50 Identities = 230/395 (58%) , Positives = 307/395 (77%) 

Query: 19 FSPIKKIjLQDIKSGKIWvLMDDENRENEGDLICAAEMVTKESINFMAKFGKGLICLPLSN 78 

FS ++ ++ 1+ GK++++ DDE+RENEGD ICAAE T E+INFMA +GKGLIC P+S 
Sbjct: 6 FSKVEDAIEAIRQGKIILVTDDEDRENEGDFICAAEFATPENINFMATYGKGLICTPIST 65 
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Query: 


79 


YYAEKLELAQMASHOTDNHETAFTISIDHLSTSTGISAEDRALTAKMVA1TOSSKAKDFRR 


138 






A+KL M + N DNHETAFT+S+DH+ T TGISA +R++TA + +D++KA DFRR 




Sbjct: 


66 


EIAKKljNFHPlWAVNQDmETAFTVSVDHIDTGTGISAFERSITAMKIVDDNAKATDFRR 


125 


Query: 


139 


PGHLFPLIAKEGGVIARNGHTEATVDLCRIAGLKECGLCCEIMAEDGSMMRKDELLAFAQ 


198 






PGH+FPL+AKEGGVL RNGHTEATVDL RLAGLK GLCCEIMA+DG+MM +L FA 




Sb j ct : 


126 


PGHMFPLIAKEGGVLVRNGHTEATTOIARLAGLKHAGLCCE1MADDGTMMTMPDLQKFAV 


185 


Query: 


199 


KHDLAIATIKQLQDYRRQEEGGVWEIEIQLPTQFGHFTAYGYSEWANKEHVALVKGDI 


258 






+H++ TI+QLQ+YRR+ + V + +++PT++G F A+ + EV++ KEHVALVKGD+ 




Sbjct: 


186 


EHNMPFITIQQLQEYRRKHDSLVKQISWKMPTKYGEFMAHSFVEVISGKEHVALVKGDL 


245 


Query: 


259 


SSGEDVLCRLHSECLTGDVFHSLRCDCGEQLANALQQIEAEGRGVLLYMRQEGRGIGLIN 


318 






+ GE VL R+HSECLTGD F S RCDCG+Q A A+ QIE EGRGV+LY+RQEGRGIGLIN 




Sbjct: 


246 


TDGEQVLARIHSECriTGDAFGSQRCDCGQQFAAAMTQIEQEGRGVILYLRQEGRGIGLIN 


305 


Query: 


319 


KLKAYHLQEEGLDTLEANLALGFEGDERDYGVSAQLLKDLGINSINLLTNNPDKIQQLEA 


378 






KL+AY LQ++G+DT+EAN+ALGF+ DER+Y + AQ+ + LG+ SI LLTNNP KI+ L+ 




Sbjct: 


306 


KLRAYELQDKGMDTVEANVALGFKEDEREYYIGAQMFQQLGVKSIRLLTNNPAKIEGLKE 


365 


Query: 


379 


EGICVKNRVPLQVAVTAYDLNYLKTKKEKMGHX.LD 413 








+G+ + R P+ V D++YLK K+ KMGH+ + 




Sb j ct : 


366 


QGIiNIVAREPIIVEPNKNDIDYLKVKQIKMGHMFN 400 





No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 333 

A DNA sequence (GBSx0363) was identified in S.agalactiae <SEQ ID 1079> which encodes the amino 
acid sequence <SEQ ID 108O. This protein is predicted to be riboflavin synthase alpha chain (ribE). 
Analysis of this protein sequence reveals the following: 

Possible site: 59 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3517 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < succ> 

A related GBS nucleic acid sequence <SEQ ID 9723> which encodes amino acid sequence <SEQ ID 9724> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB05274 GB:AP001512 riboflavin synthase alpha subunit [Bacillus halodurans] 
Identities = 98/216 (45%) , Positives = 147/216 (67%) , Gaps = 2/216 (0%) 



Query: 


1 


MFTGIIEEMGQVSRIRNGIKSQQLSIDAPKLVPLLRKGDSVAVNGVCLTVLDKSETAFIA 


60 






MFTGIIE++G + 1+ ++ ++I + K+V ++ GDS+AVNGVCLTV ++T F 




Sbjct: 


1 


MFTGIIEDVGTIDAIQQTGFAIVMTITSKKIVSDVQLGDSIAVNGVCLTVTSFTDTQFTV 


60 


Query: 


61 


DvMPES^WKTSLAALRLHSKVNLELALRSDSRLGGHFvLGHVDGVGKIEKIQKDDIAVRF 


120 






D+MPE++ TSL L S+VNLE A+ ++ R GGH V GHVDG+G I K ++ D AV + 




Sbjct: 


61 


DLMPETvRATSLRLLSKGSRVNLERAMVANGRFGGHIVSGHVDGIGTIRKKERKDNAVYY 


120 


Query: 


121 


SIDAPPSIMSYIIEKGSVALDGISLTWSFTEHSFEWSVTPHTMAQTNLSLKKVGDLMI 


180 




+1+ S+ Y+I KGSVA+DG SLT+ ++ +F +S+IPHTM +T + LKK GD++NI 




Sbjct: 


121 


TIEVSSSLRRYMIHKGSVAVDGTSLTIFDVSDKTFTISIIPHTMEETIIGLKKAGDIVNI 


180 
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Query: 181 EVDVLGKYAEKFLAPTNRTNHTSSVMDWSFLSENGY 216 

E D++GKY E+F+ N + +FL+E+GY 

Sbjct: 181 ECDLIGKYIEQFVQQGKPVNEGG- -LTKAFLTEHGY 214 

5 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 334 

A DNA sequence (GBSx0364) was identified in S.agalactiae <SEQ ID 1081> which encodes the amino 
10 acid sequence <SEQ ID 1082>. This protein is predicted to be riboflavin-specific deaminase (ribD). 
Analysis of this protein sequence reveals the following: 

Possible site: 30 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.01 Transmembrane 307 - 323 ( 307 - 323) 

15 

Final Results 

bacterial membrane Certainty=0 . 1404 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

20 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAA86522 GB:U27202 riboflavin-specific deaminase [Actinobacillus 
pleuropneumoniae] 
Identities = 182/353 (51%) , Positives = 259/353 (72%) 

25 



Query: 


6 


DYMAlALKEAEKGMGFVAPNPLVGAVIVKDDRIISKGYHKRFGDLHftERQAIICNADEDIS 


65 






DYM A+ A++G+G+ PNPLVG VIVK+ I+++GYH++ G HAER A+ + ED+S 




Sbjct: 


51 


DYMRRAIAIjyCQGLGWTNPNPLVGCVIVKNGEIVAEGYHEKIGGWHAERNAvLHCKEDLS 


110 


Query: 


66 


GSTLYVTLEPCCHVGKQPPCTEALI KSGI KKVWGSLDPNPLVSGKGIALLRKEGLNVEV 


125 






G+T YVTLEPCCH G+ PPC++ LI+ GIKKV +GS DPNPLV+G+G LR+ G+ V 




Sbjct: 


111 


GATAYVTLEPCCHHGRTPPCSDLLIERGIKKVFIGSSDPNPLVAGRGANQLRQAGVEWE 


170 


Query: 


126 


GILREECDALNERFIFHMTYKQPFWLKYAMTLDGKIATKTGDSKWISNEHSRQSVQKLR 


185 






G+L+EECDALN F ++ K+P+V +KYAMT DGKIAT +G+SKWI+ E +R VQ+ R 




Sb j ct : 


171 


GLLKEECDALNPIFFHYIQTKRPYVLMKYAMTADGKIATGSGESKWITGESARARVQQTR 


230 


Query: 


186 


QKCSAI^GINTVLADNPRLTCRIPKGEALVRIVCDSQLKIPLDSYLVKSAKTIPTWIAT 


245 






+ SAIIWG++TVLADNP L R+P + VRIVCDSQL+ PLD LV++AK T IAT 




Sbjct: 


231 


HQYSAIMVGVDTVIJyjNPMIJSISRMPNAKQPVRIVCDSQLRTPLDCQLVQTAKEYRTVIAT 


290 


Query: 


246 


CSDNLAQQQTLKEMGCRLIKVPRKDGKLDLKVLMTILGQEGIDSLLIEGGSSLHFSALKA 


305 






SD+L + + + +G ++ ++ ++DL+ L+ LG+ IDSLL+EGGSSL+FSAL++ 




Sb j ct : 


291 


VSDDLQKIEQFRPLGVDVLVCKARNKRVDLQDLLQKLGEMQIDSLLLEGGSSLNFSALES 


350 


Query: 


306 


GI VNRLI VFIAPKI IGGLKAKTAISGEGLDWLNQAFRVKDIELSRMDSDWIE 358 








GIVNR+ +IAPK++GG +AKT I GEG+ ++QA ++K + D++++ 




Sb j ct : 


351 


GIVNRVHCYIAPKLVGGKQAKTPIGGEGIQQIDQAVKLKLKSTELIGEDILLD 403 





50 A related DNA sequence was identified in S.pyogenes <SEQ ID 1083> which encodes the amino acid 
sequence <SEQ ID 1084>. Analysis of this protein sequence reveals the following: 

Possible site: 25 

»> Seems to have no N-terminal signal sequence 
55 INTEGRAL Likelihood = -1.17 Transmembrane 88 - 104 ( 88 - 105) 

Final Results 

bacterial membrane Certainty=0. 1468 (Affirmative) < suco 



WO 02/34771 



PCT/GB01/04789 



-424- 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:CAB11794 GB:Z99104 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 71/161 (44%) , Positives = 109/161 (67%) 

Query: 13 LEEQTYFMQEMKE^KSLQKAEIPIGCTIVKDGEIIGRGHNAREESNQAIMHAEMMAIN 72 

+ + +M+EA+KEA+K+ +K E+PIG V+V +GEII R HN RE ++I HAEM+ 1+ 
Sbjct: 1 MTQDELMKEAIKEAKKAEEKGEVPIGAVLVINGEIIARAHNLRETEQRSIAHAEMLVID 60 

Query: 73 EftNMEGNWRLLDTTLFVTlEPOTMCSGAIGLaRIPHVIYGASNQKFGGVDSLYQILTDE 132 

EA G WRL TL+VT+EPC MC+GA+ L+R+ V++GA + K G +L +L +E 
Sbjct: 61 EACK^GTWRLEGATLYVTLEPCPMCAGAVVLSRVEKVVFGAFDPKGGCSGTLMNLLQEE 120 

Query: 133 RLNHRVQVERGLLAADCANIMQTFFRQGRERKKIAKHLIKE 173 

R NH+ +V G+L +C ++ FFR+ R++KK A+ + E 
Sbjct: 121 RFNHQAEWSGVLEEECGGMLSAFFRELRKKKKAARKNLSE 161 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 48/146 (32%) , Positives = 71/146 (47%) , Gaps 



21/146 (14%) 



Query: 7 YMALALKEAEKGMGFVAPNPLVGAVIVKDDRIISKGYHKRFGD LHAERQAI KNADE 62 

+M ALKEAEK + A P +G VIVKD II +G++ R +HAE AI A+ 

Sbjct: 19 FMQEALKEAEKSLQ-KAEIP-IGCVIVKDGEIIGRGHNAREESNQAIMHAEMMAINEANA 76 

Query: 63 D ISGSTLYVTLEPCCHVGKQPPCTEALIKSGIKKVWGSLDPNPLVSGKGIALLR 117 

+ +TL+VT+EPC C+ A+ +1 V+ G+ + +L 

Sbjct: 77 HEGNWRLLDTTLFVTIEPCV MCSGAIGLftRIPHVIYGASNQKFGGVDSLYQILT 130 

Query: 118 KEGLN VEVGILREECDALNERF 139 

E LN VE G+L +C + + F 
Sbjct: 131 DERLNHRVQVERGLLAADCANIMQTF 156 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 335 

A DNA sequence (GBSx0365) was identified in S.agalactiae <SEQ ID 1085> which encodes the amino 
acid sequence <SEQ ID 1086>. This protein is predicted to be Nramp metal ion transporter. Analysis of this 
protein sequence reveals the following: 

Possible site: 43 

»> Seems to have no N-terminal signal sequence 



INTEGRAL 


Likelihood 




•11. 


,89 


Transmembrane 


169 


- 185 


( 


160 


- 191) 


INTEGRAL 


Likelihood 




■11. 


.09 


Transmembrane 


140 


- 156 


( 


128 


- 165) 


INTEGRAL 


Likelihood 




-6. 


.85 


Transmembrane 


359 


- 375 


( 


354 


- 379) 


INTEGRAL 


Likelihood 




-6. 


.48 


Transmembrane 


269 


- 285 


( 


263 


- 287) 


INTEGRAL 


Likelihood 




-6. 


.16 


Transmembrane 


426 


- 442 


( 


423 


- 445) 


INTEGRAL 


Likelihood 




-5. 


.57 


Transmembrane 


62 


- 78 


( 


58 


- 80) 


INTEGRAL 


Likelihood 




-4. 


.94 


Transmembrane 


107 


- 123 


( 


103 


- 127) 


INTEGRAL 


Likelihood 




-4.46 


Transmembrane 


391 


- 407 


( 


389 


- 408) 


INTEGRAL 


Likelihood 




-4. 


.35 


Transmembrane 


310 


- 326 


( 


307 


- 328) 



Final Results 

bacterial membrane -- 
bacterial outside -• 
bacterial cytoplasm -■ 



- Certainty=0. 5755 (Affirmative) < suco 

- Certainty=0. 0000 (Not Clear) < suco 

- Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAF83825 GB:AE003939 manganese transport protein [Xylella 
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10 



15 



20 



30 



35 
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fastidiosa] 

Identities = 192/436 (44%) , Positives = 274/436 (62%) , Gaps = 14/436 (3%) 



SL E++ SV V + L AFLGPG +V+VGYMDPGNW T + GG+ + Y+LL V+L 



+S++MA+ LQ +A +LGI + DLAQA +R + L+++ ELA+IA DLAEVIG+A 



IAL+LL G P++ ++IT +DV L+LLLM G + +EAFV L+L I F +VL+ P 



L + GF+P ++ I) LA+GI+GATVMPHNLYLHSS+ QTR 

PLQEVLGGFVPRWQW ADPQALYLAIGI VGATVMPHNLYLHSS I VQTRAYP - RTP 272 



+ A+R+ DS 4 L LA +N+ +L+L A++F+ H D+ Q Y L+ 



Query: 


10 


Sb j ct : 


39 


Query: 


70 


Sbj ct : 


99 


Query: 


130 


Sbj ct : 


159 


Query: 


190 


Sbjct: 


219 


Query: 


250 


Sbjct: 


273 


Query: 


309 


Sbj ct : 


333 


Query: 


369 


Sbjct: 


388 


Query: 


429 


Sbj ct : 


446 



25 G A TLFA ALLASG NST+T TL GQIVMEGFL +L WL R+ TR L ++P+ 

A TLFATALLASGINSTVTATLAGQIVMEGFLRLRLRPWLRRVLTRGLAIVPV 387 



V+ L G E +L++ SQV LS+ LPF++ PL+ + + +MG +W +A+ 

IVWALYG- -EQGTGRLLLLSQVILSMQLPFAVIPLLRCVADRKVMGALVAPRWLMWAW 445 

LVAIILTLLNLKLIMD 444 
L+A ++ +LN+KL+ D 
LIAGVIWLNVKLLGD 461 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 336 

40 A DNA sequence (GBSx0366) was identified in S.agalactiae <SEQ ID 1087> which encodes the amino 
acid sequence <SEQ ID 1088>. Analysis of this protein sequence reveals the following: 

Possible site: 54 

»> Seems to have a cleavable N-term signal seq. 

45 



INTEGRAL 


Likelihood 




-14. 


.12 


Transmembrane 


113 


- 129 


( 


98 


- 132) 


INTEGRAL 


Likelihood 




-12. 


.15 


Transmembrane 


228 


- 244 


( 


220 


- 249) 


INTEGRAL 


Likelihood 




-10. 


.83 


Transmembrane 


175 


- 191 


( 


167 


- 195) 


INTEGRAL 


Likelihood 




-5. 


,04 


Transmembrane 


57 


- 73 


( 


55 


- 75) 


INTEGRAL 


Likelihood 




-3, 


.93 


Transmembrane 


146 


- 162 


( 


142 


- 166) 


INTEGRAL 


Likelihood 




-1. 


.38 


Transmembrane 


199 


- 215 


( 


199 


- 215) 


INTEGRAL 


Likelihood 




-0. 


.32 


Transmembrane 


82 


- 98 


( 


82 


- 98) 



50 

Final Results 

bacterial membrane Certainty=0. 6647 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

55 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAF11325 GB:AE002018 hypothetical protein [Deinococcus radiodurans] 
Identities = 63/215 (29%), Positives = 108/215 (49%), Gaps = 13/215 (6%) 

60 

Query: 11 LLLVFILTIIVNYLSATGFLTGNSQKSLSDRYQTLLTPAPLAFSIWSVIYL-LTFLVILR 69 
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LL +LT++VNYLS L GNS +SDR TPA L F++W I+L L + + 

Sbjct: 10 LIAATVLTLVVNYLSNALPLFGNSNAEVSDRLPNAFTPA6LTFTVWGPIFLGLLVFAVYQ 69 

Query: 70 AIFSKSQSYQDNFASIFPYFLGLLLVNNIWTVFFTSNL1GLSTIIIFAYCILLV-IIIKI 128 
5 A+ ++ + D +P+ LG LL N W + F S IGLS +1+ A +LV + + + 

Sbjct: 70 ALPAQRGARLDRL--FWPFLLGNLL-NVAWLIAFQSI 1 NIGLSWIMIALIAVLVRLYLSV 126 

Query: 129 LS---KNKSKLLLRITFGIHAGWLLVASLVNLAVYLVKI DFNYPLPKVYIAI IALI 181 

S + + L++ ++ W+ VA++ N+ +LV F V+ A++ ++ 

10 Sbjct: 127 RSLPPQGAERWTLQIiPVSLYLAWISVATIANITAFLVSAGVTQSFLGIAGPVWSALIiLW 186 

Query: 182 FITVLSLYLARVLQNAYLILSVFWAWLMVFKAHLE 216 

+ +L R A+ + + VJA+ V+ A E 
Sbjct: 187 AAAIGVFFLWRFRDYAFAA.V-LLWAFYGVYVARPE 220 

15 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 337 

20 A DNA sequence (GBSx0367) was identified in S.agalactiae <SEQ ID 1089> which encodes the amino 
acid sequence <SEQ ID 1090>. Analysis of this protein sequence reveals the following: 

Possible site: 36 

>» Seems to have no N-terminal signal sequence 

25 Final Results 

bacterial cytoplasm — Certainty=0. 3401 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

30 The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC65352 GB:AE001215 T. pallidum predicted coding region 
TP0352 [Treponema pallidum] 
Identities = 28/64 (43%) , Positives = 41/64 (63%) 

35 Query: 3 EFTFEIVEKLLVLSENEKGWTKELNRVSFNGAPAKFDLRTWSPDHTKMGKGITLSNEEFK 62 

+F +E+ LS + GW+ EL +S+NG P K+D+R WSPD +KMGKG+TL+ E 

Sbjct: 12 DFHYEVTRNWGTLSTSGNGWSLELKSISWNGRPEKYDIRAWSPDKSKMGKGVTLTRAEIV 71 

Query: 63 VILD 66 
40 + D 

Sbjct: 72 ALRD 75 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1091> which encodes the amino acid 

sequence <SEQ ID 1092>. Analysis of this protein sequence reveals the following: 

45 Possible site: 36 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4021 (Affirmative) < suco 

50 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 59/70 (84%) , Positives = 64/70 (91%) 



55 



Query: 1 MSEFTFEI VEKLLVLSENEKGWTKELNRVSFNGAPAKFDLRTWSPDHTKMGKGITLSNEE 60 

M+EFTF I E LL LSEN+KGWTKELNRVSFNGA AK+D+RTWSPDHTKMGKGITL+NEE 
Sbjct: 1 MAEFTFNIEEHLLTLSENDKGWTKELNRVSFNGAEAKWDIRTWSPDHTKMGKGITLTNEE 60 
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Query: 61 FKVILDAFRK 70 

FK ILDAFRK 
Sbjct: 61 FKTILDAFRK 70 

5 

Based on this analysis, it was predicted that these proteins and their epitopes could he useful antigens for 
vaccines or diagnostics. 

Example 338 

A DNA sequence (GBSx0368) was identified in S.agalactiae <SEQ ID 1093> which encodes the amino 
acid sequence <SEQ ID 1094>. Analysis of this protein sequence reveals the following: 

Possible site: 61 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.66 Transmembrane 92 - 108 ( 92 - 110) 

Final Results 

bacterial membrane Certainty=0 . 2062 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Mot Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



20 The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB14676 GB:Z99117 similar to protease [Bacillus subtilis] 
Identities = 201/407 (49%) , Positives = 277/407 (67%) , Gaps = 2/407 (0%) 





Query: 


4 


VKKRPEVLSPAGTLEKLKVAIDYGADAVFVGGQAYGLRSRAGNFSMEELQEGINYAHARD 


63 


25 






+ K+PE+L+PAG LEKLK+A+ YGADAVF+GGQ YGLRS A NF++EE+ EG+ +A 






Sbjct: 


18 


ITKKPELLAPAGNLEKLKIAVHYG3U3AVFIGGQEYGLRSNADNFTIEEIAEGVEFAKKYG 


77 




Query: 


64 


AKVYVAANMVTHEGNELGAGPWFRELRDMGLiaVIVSDPALIVICATEAPGLEIHLSTQA 


123 








AK+YV N+HNG ++LD + +IV+DP +1 C AP +E+HLSTQ 




30 


Sbjct: 


78 


AKIYVTTNI FAHNENMDGLEDYLKALGDANVAGI IVADPLI IETCRRVAPNVEVHLSTQQ 


137 




Query: 


124 


SSTNYETFEFWKEMGLTRVVIAREVTMAEIAEIRKRTDVEIEAFVHGAMCISYSGRCVLS 


183 








S +N++ +FWKE GL RWLARE + E+ E++++ D+EIE+F+HGAMCI+YSGRCVLS 




35 


Sb j ct : 


138 


SLSNWKAVQFWKEEGLDRVVLARETSALEIREMPCEKVDIEIESFIHGAMCIAYSGRCVLS 


197 




Query: 


184 


NHMSHRDANRGGCSQSCRWKYDLYDMPFGQERQSLKGEIPEPFSMSAVDMCMIEHIPDMI 


243 








NHM+ RD+NRGGC QSCRW YDLY G +L GE PF+MS D+ +IE IP MI 






Sbjct: 


198 


NHMTARDSNRGGCCQSCRWDYDLYQTD-GANAVALYGEEDAPFAMSPKDLKLIESIPKMI 


256 


40 


Query: 


244 


ENGVDSLKIEGRMKSIHYVSTVTNCYKAAVDAYMESPEAFEAIKEDLIDELWKVAQRELA 


303 








E G+DSLKIEGRMKSIHYV+TV + Y+ +DAY PE F I+++ ++EL K A R+ A 






Sb j ct : 


257 


EMGIDSLKIEGRMKSIHYVATWSWRKVIDAYCADPENF-VIQKEWLEELDKCANRDTA 


315 


45 


Query: 


304 


TGFYYHTPTENEQLFGARRKIPQYKFVGEWSFDNAKMEATIRQRNVIMEGDRVEFYGPG 


363 






T F+ TP EQ+FG K Y FVG V+++D T++QRN +GD VEF+GP 






Sbjct: 


316 


TAFFEGTPGYEEQMFGEHAKKTTYDWGLvIl^IYDEDTQ^IVTLQQRNFFKKGDEVEFFGPE 


375 




Query: 


364 


FRHFECFIDGLRDAEGNKIDRAPNPMELLTITLPNPVKKGDMIRACK 410 










+F 1+ + D +GN++D A +P++++ L + +M+R K 




50 


Sb j ct : 


376 


IENFTHTIETIWDEDGNELDAARHPLQIVKFKLDKKIYPSNMMRKGK 422 





A related DNA sequence was identified in S.pyogenes <SEQ ID 1095> which encodes the amino acid 
sequence <SEQ ID 1096>. Analysis of this protein sequence reveals the following: 

Possible site: 61 
55 >» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.66 Transmembrane 92 - 108 ( 92 - 110) 

Final Results 

bacterial membrane Certainty^=0. 2062 (Affirmative) < suco 

60 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



10 



15 
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bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:BAB04993 GB : AP001511 protease [Bacillus halodurans] 
Identities = 201/403 (49%) , Positives = 280/403 (68%) , Gaps = 4/403 (0%) 

Query: 6 KRPEVLSPAGTLEKLKVAIDYGADAVFVGGQAYGLRSRAGNFSMEELQEGIDYAHARGAK 65 

K+PE+L+PAG+LEKLKVAI YGADAV++GGQ +GLRS A NFS+EE++EG+++A+ GAK 
Sbjct: 17 KKPELLAPAGSLEKLKVAIHYGADAVYIGGQEFGLRSNADNFSIEEMREGVEFANKYGAK 76 

Query: 66 VYVAANMVTHEGNEIGAGEWFRQLRDMGLDAVIVSDPALIVICSTEAPGLEIHLSTQASS 125 

VYV N+ H N G E+ L+++G+ +IV+DP +1 C AP +E+HLSTQ S 
Sbjct: 77 VYTOTOIYAHNENMDGLEEYLSALQEVGVTGIIvADPLIIETCKRVAPKVEVHLSTQQSrj 136 

15 Query: 126 TNYETFEFWKAMGLTRWLAREVNMAELAEIRKRTD^ 185 

+N+ +FWK GL RWLAREV + E+ E++K D+EIE FVHGAMCI S YSGRCVLSNH 
Sbjct: 137 SNWIAVKFWKEEGLHRVVIAREVGLEEIttEMKKHVDIEIETFVHGAMCISYSGRCVLSNH 196 

Query: 186 MSHRDANRGGCSQSCRWKYDLYDMPFGGE - RRSLKGEI PEDYSMSSVDMCMIDHIPDIiIE 244 
20 M+ RD+NRGGC QSCRW YDLY+ E +G++P Y+MS D+ +1 IP LIE 

Sbjct: 197 MTARDSNRGGCCQSCRWDYDLYEQQDSAEIPLFAEGDVP- -YTMSPKDLNLIQAIPQLIE 254 

Query: 245 NGVDSLKIEGRMKSIHWSTVTNCYKAAVGAYMESPEAFYAIKEELIDELWKVAQREIAT 304 
G+DSLK+EGRMKSIHYV+TVT+ Y+ + AY P+ F IK E ++EL K A R+ A 
25 Sbjct: 255 AGIDSLKATEGRMKSIHYVATVTSVYRKVIDAYCSDPDNF-KIKREWLEELEKCANRDFAP 313 

Query: 305 GFYYGI PTENEQLFGARRKI PQYKFVGEWAFDSASMTATIRQRNVIMEGDRIECYGPGF 364 

F+ G PT EQ++G K +Y FVG V+ ++ + T++QRN +GD +E +GP 
Sbjct: 314 QFFEGTPTYKEQMYGIHPKRTKYDFVGLVLDYNEKTGIVTLQQRNHFKQGDEVEFFGPEI 373 

30 

Query: 365 RHFETVVKDLHDADGQKIDRAPNPMELLTISLPREVKPGDMIR 407 

F V+ + D DG ++D A +P++++ + ++V P +M+R 
Sbjct: 374 NRFTQTVEKIWDEDGNELDAARHPLQIVKFKVDQKVYPQNMMR 416 

35 An alignment of the GAS and GBS proteins is shown below: 

Identities = 386/427 (90%) , Positives = 404/427 (94%) 

Query: 1 MSNVKKRPEVLSPAGTLEKLKVAIDYGADAVFVGGQAYGLRSRAGNFSMEELQEGINYAH 60 
MS++KKRPEVLSPAGTLEKLKVAIDYGADAVFVGGQAYGLRSRAGNFSMEELQEGI+YAH 
40 Sbjct: 1 MSHMKITCPEVLSPAGTLEKLKVAIDYGADAVFVGGQAYGLRSRAGNFSMEELQEGIDYAH 60 

Query: 61 ARDAKA7YVAANMVTHEGNELGAGPWFRELRDMGLDAVIVSDPALIVICATEAPGLEIHLS 120 

AR AKVYVAANMVTHEGNE+GAG WFR+LRDMGLDAVIVSDPALIVIC+TEAPGLEIHLS 
Sbjct: 61 ARGAKVYVAANMVTHEGNEIGAGEWFRQLRDMGLDAVIVSDPALIVICSTEAPGLEIHLS 120 

45 

Query: 121 TQASSTNYETFEFWKEMGLTRVVtAREOTMAELAEIRKRTDVEIEAFVHGAMCISYSGRC 180 

TQASSTNYETFEFWK MGLTRWLAREV MAELAEIRKRTDVEIEAFVHGAMCISYSGRC 
Sbjct: 121 TQASSTNYETFEFWKAMGLTRVVIAREVNMAELAEIRKRTDvEIEAFvHGAMCISYSGRC 180 

50 Query: 181 VI.SNHMSHRDANRGGCSQSCRWKYDLYDMPFGQERQSLKGEIPEPFSMSAVDMCMIEHIP 240 

VLSNHMSHRDANRGGCSQS CRWKYDL YDMPFG ER+SLKGEIPE +SMS+VDMCMI+HIP 
Sbjct: 181 VLSNHMSHRDANRGGCSQSCRWKYDLYDMPFGGERRSLKGEIPEDYSMSSVDMCMIDHIP 240 

Query: 241 DMIENGVDSLKIEGRMKSIHWSTVTNCTKAAVTJAYMESPFjAFFAIKEDLIDELWKVAQR 300 
55 D+IENGVDSLKIEGRMKSIHYVSTVTNCYKAAV AYMESPEAF AIKE+LIDELWKVAQR 

Sbjct: 241 DLIENGVDSLKIEGRMKSIHYVSTVTNCYKAAVGAYMESPEAFYAIKEELIDELWKVAQR 300 

Query: 301 EIATGFYYHTPTENEQLFGARRKIPQYK5VGEWSFDNAKMEATIRQRNVIMEGDRVEFY 360 
ELATGFYY PTENEQLFGARRKI PQYKFVGEW+FD+A M ATIRQRNVIMEGDR+E Y 
60 Sbjct: 301 EIATGFYYGIPTENEQLFGARRKIPQYKFVGEVVAFDSASMTATIRQRNVIMEGDRIECY 360 

Query: 361 GPGFRHFECFIDGLRDAEGNKIDRAPNPMELLTITLPNPVKKGDMIRACKEGLYNLYQND 420 

GPGFRHFE + L DA+G KIDRAPNPMELLTI+LP VK GDMIRACKEGLVNLYQ D 
Sbjct: 361 GPGFP3FETVVKDLHDADGQKIDRAPNPMELLTISLPREVKPGDMIRACKEGLVNLYQKD 420 



65 



Query: 421 GTSKTVR 427 



WO 02/34771 



PCT/GB01/04789 



-429- 



GTSKTVR 
Sbjct: 421 GTSKTVR 427 

SEQ ID 1094 (GBS385) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
5 extract is shown in Figure 69 (lane 3; MW 50kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 72 (lane 7; MW 75.7kDa). 

The GBS385-GST fusion product was purified (Figure 213, lane 7) and used to immunise mice. The 
resulting antiserum was used for FACS (Figure 312), which confirmed that the protein is immunoaccessible 
on GBS bacteria. 

10 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



acid sequence <SEQ ID 1098>. This protein is predicted to be collagenase. Analysis of this protein 
15 sequence reveals the following: 



Example 339 

A DNA sequence (GBSx0369) was identified in S.agalactiae <SEQ ID 1097> which encodes the amino 



Possible site: 43 

>>> Seems to have no N-terminal signal sequence 



Final Results 



20 



bacterial cytoplasm Certainty=0. 2208 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



25 



The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB14677 GB:Z99117 similar to protease [Bacillus subtilis] 
Identities = 92/304 (30%) , Positives = 161/304 (52%) , Gaps = 5/304 (1%) 



30 



Query: 1 MEKI ILTATAES IEQVKQLLAIGIDRI YVGEENYGLRLPHSFSDDELREIAKLVHDAGKE 60 

M+K L T S + L+ G VGE+ YGLRL FS +++ + ++ H G + 

Sbjct: 1 MKKPELLVTPTSTADILPLIQAGATAFLVGEQRYGLRLAGEFSREDVTKAVEIAHKEGAK 60 



Query: 61 LTVACNALMHQEMMDNIKPFLELMKEINVDYLVVGDAGVFYINKRDGYNFKLIYDTSVFV 120 

+ VA NA+ H++ + +L+EVDVGDV + +KL + T 
Sbjct: 61 VWAVNAIFHNDKVGELGEYIAFLAEAGVDAAVFGDPA^ 120 



35 



Query: 121 TSSRQWFWGQHGAVETVLAREIPSEELFKMSENLEFPAEILVYGASVIHHSKRPLLQNY 180 

T+ N+WG+ GA +VLARE+ + + ++ EN E EI V+G + + SKR L+ NY 
Sbjct: 121 TNYYTCNYVJGRKGAARSVLAREIiNMDS I VE I KENAEVE IEIQ VHGMTCMFQSKRSLIGNY 180 



40 



Query: 181 YNF---THITDEKTRERGLFLAEPGDPESHYSIYEDKHGTHIFINNDINMMTKVTELVEH 237 

+ + + K +E G+FL + + ++ Y I+ED++GTHI ND+ ++ ++ EL++ 

Sbjct: 181 FEYQGKVMDIERKKKESGMFLHDK-ERDNKYPIFEDENGTHIMSPNDVCIIDELEELIDA 239 



45 



Query: 238 HFTHWKLDGIYCPGDNFVAIAEIFVETARL-IENGTFTQDQAFLFDERIRKLHPKGRGLD 296 

+K+DG+ + + + +++ E I> +EN + + + ERI + P R +D 
Sbjct: 240 GIDSFKIDGVLKMPEYLIEOTKMYREAIDLCVENRDEYEAKKEDWIERIESIQPVNRKID 299 



50 



Query: 297 TGFY 300 
TGF+ 

Sbjct: 300 TGFF 303 



A related GBS nucleic acid sequence <SEQ ID 10949> which encodes amino 
10950> was also identified. 



acid sequence <SEQ ID 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 1099> which encodes the amino acid 
sequence <SEQ ID 1100>. Analysis of this protein sequence reveals the following: 

Possible site: 31 

>>> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 1716 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 245/308 (79%) , Positives = 273/308 (88%) 

Query: 1 MEKIILTATAESIEQVKQLIAIGIDRIYVGEENYGLRLPHSFSDDELREIAKLVHDAGKE 60 
15 MEKI I +TATAES IEQVK LLA G+DRIYVGE NYGLRLPH+FS DELR+ I AKLVHDAGKE 

Sbjct: 1 MEKIIITATAESIEQVKALLAAGVDRIYVGEANYGLRLPHNFSYDELRQIAKLVHDAGKE 60 

Query: 61 LWAOTAL^fflQE^™NIKPFLELMKEII^fVI)YLWGDAGVFyINKRDGYNFKLIYDTSVFV 120 
LTVACNALMHQ+MMD IKPFL+LM EI VDYLWGDAGVFY+NKRDGYNFKLIYDTSVFV 
20 Sbjct: 61 LTOACNALMHQDMDQIKPFLDLMIEIAVTjyLWGDAGVFYVNKRDGYNFKLiyDTSVFV 120 

Query: 121 TSSRQWFWGQHGAVETVLAREIPSEELFKMSENLEFPAEILVYGASVIHHSKRPLLQNY 180 

TSSRQVNFWGQHGAVE+VLAREIPS ELF ++ENLEFPAE+LVYGASVIHHSKRPLL+NY 
Sbjct: 121 TSSRQVNFWGQHGAVESVIAREIPSAELFTLAENLEFPAEVLVYGASVIHHSKRPLLENY 180 

25 

Query: 181 YNFTHITDEKTRERGLFLAEPGDPESHYSIYEDKHGTHIFINNDINMMTKVTELVEHHFT 240 

Y+FT I DE +RERGLFLAEPGD SHYSIYED HGTHIFINNDI+MM+K+ EL H T 
Sbjct: 181 YHFTKIDDEWSRERGLFLAEPGDASSHYSIYEDNHGTHIFINNDIDMMSKLGELYAHGLT 240 

30 Query: 241 HWKLDGIYCPGDNFVAIAEIFVETARLIENGTFTQDQAFLFDERIRKLHPKGRGLDTGFY 300 

HWKLDGIYCPGD+FVAI ++F++ L+E G FTQ++A D+ + HP GRGLDTGFY 
Sbjct: 241 HWKLDGIYCPGDDFVAITKLFIQAKTLLEMQFTQEEAEKLDQAvHAHHPAGRGLDTGFY 300 

Query: 301 DFDPSTVK 308 
35 +FDP TVK 

Sbjct: 301 EFDPKTVK 308 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

40 Example 340 

A DNA sequence (GBSx0371) was identified in S.agalactiae <SEQ ID 1101> which encodes the amino 
acid sequence <SEQ ID 1102>. This protein is predicted to be cDNA EST yk542cl2.5 comes from this 
gene. Analysis of this protein sequence reveals the following: 

Possible site: 16 
45 »> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0 .3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

50 bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 



>GP:AAD15622 GB:U75480 unknown [Streptococcus mutans] 
Identities = 69/152 (45%) , Positives = 101/152 (66%) , Gaps = 12/152 (7%) 

Query: 1 MSKLFKTLVISAASGAAARYFLTTKKGKELRKNAEKFYGEYKENPEEYHQIAKDKASEYS 60 

MSK KT +1 A +GAAAAYFL+T KGK+ +K + + +YKENP+EYHQ A DK +EY 
Sbjct: 1 MSKFLKTAIIGAGTGAARAYFLSTDKGKQFKKKIHQTFTDYKENPKEYHQYAADKVNEYK 60 
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Query: 61 NIAVDTFKDYKGKFESGELTTEDIVSAVKEKSGEWDFANDFVNQAKSKFSDEDTAKKED 120 

++AV +FKDYK KFE+GELT ++I+S+VKEK+ + FAN ++Q K + T +K + 
Sbjct: 61 DVATOSFKDyKDKFETGELTKDNIISSVKEKASQftGKFANSKLSQVKDHLA--QTVEKAE 118 

Query: 121 KAP ETKVEDIVIDYKENTEDKE 142 

+ + +V+DIVIDY+ + K+ 

Sbjct: 119 ASTNDAGIPLGEMKAQVDDIVIDYQAEEKTKK 150 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1103> which encodes the amino acid 
sequence <SEQ ID 1 104>. Analysis of this protein sequence reveals the following: 

Possible site: 26 

>>> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -1.81 Transmembrane 15 - 31 ( 14 - 31) 

Final Results 

bacterial membrane Certainty=0 . 1723 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related sequence was also identified in GAS <SEQ ID 9117> which encodes the amino acid sequence 
<SEQ ID 91 18>. Analysis of this protein sequence reveals the following: 

Possible cleavage site: 19 
»> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty= 0.300 (Affirmative) < suco 

bacterial membrane Certainty= 0.000 (Not Clear) < suco 

bacterial cytoplasm Certainty= 0.000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 69/140 (49%) , Positives = 91/140 (64%) , Gaps = 8/140 (5%) 

Query: 1 MSKLFKTLVISAASGAAARYFLTTKKGKELRKNAEKFYGEYKENPEEYHQIAKDKASEYS 60 

M+K FK LVI A SG AAAYFL+T+KGK L+ AEK Y YKE+P++YHQ AK+K SEYS 
Sbjct: 8 MNKSFKNLVIGAVSGVAAA.YFLSTEKGKALKNRAEKAYQAYKESPDDYHQFAKEKGSEYS 67 

Query: 61 NLAVDTFKDYKGKFESGELTTEDIVSAVKEKSGEVVDFANDFVNQAKSKFSD-EDTAKKE 119 

+LA DTF D K K SG+LT ED++ +K+K+ FV + K ++ E K++ 

Sbjct: 68 HLARDTFYDVKDKLASGDLTKEDMLDLLKDKT TAFVQKTKETLAEVEAKEKQD 120 

Query: 120 DKAPETKVEDIVIDYKENTE 139 

D + EDI+IDY E E 
, Sbjct: 121 DVIIDLNEEDIIIDYTEQDE 140 

SEQ ID 1102 (GBS 164) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 30 (lane 4; MW 17.4kDa). 

The GBS164-His fusion product was purified (Figure 115A; see also Figure 200, lane 4) and used to 
immunise mice (lane 1+2+3 product; 20|ag/mouse). The resulting antiserum was used for Western blot, 
FACS (Figure 115B), and in the in vivo passive protection assay (Table III). These tests confirm that the 
protein is immunoaccessible on GBS bacteria and that it is an effective protective immunogen. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 341 

A DNA sequence (GBSx0372) was identified in S.agalactiae <SEQ ID 1105> which encodes the amino 
acid sequence <SEQ ID 1106>. Analysis of this protein sequence reveals the following: 

Possible site: 19 
5 >>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-16.93 Transmembrane 6 - 22 ( 1-31) 

Final Results 

bacterial membrane Certainty=0 . 7771 (Affirmative) < suco 

10 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAD15621 GB:U75480 unknown [Streptococcus mutans] 
15 Identities = 88/129 (68%) , Positives = 112/129 (86%) 

Query: 1 MIEIAVLIIAIAFVVLvLGILFVLKKVSETIEETKQTIKVLTSDvNOTLYQTNEILAKAN 60 

M EIA+LI+AIAF VLV+ ++ +L+K+S+T++E++QT+K+LTSDVNVTLYQTNE+LAKAN 
Sbjct: 1 MWEIALLIVAIAFAVLVIYLILLLRKISDTVDESRQTLKILTSDVNVTLYQTNELLAKAN 60 

20 

Query: 61 VLVDDWGKVSTIDPLFVAIADLSESVSDLNLQARHIGQKASSATSSVTKAGSALAIGKA 120 

VLV+DVNGKV TIDPLF AIADLS SVSDLN QAR+ G+K +T++V KAG+A GK 
Sbjct: 61 VLWDWGKVETIDPLFTAIADLSVSVSDliNRQARYFGKKTRKSTANVGKAGAAYTPGKV 120 

25 Query: 121 ASKIFRKKG 129 

ASK+FRKKG 
Sbjct: 121 ASKLFRKKG 129 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1107> which encodes the amino acid 

30 sequence <SEQ ID 1 108>. Analysis of this protein sequence reveals the following: 

Possible site: 16 
>>> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -0.85 Transmembrane 18 - 34 ( 17 - 34) 

35 Final Results 

bacterial membrane Certainty=0 . 1341 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

40 The protein has homology with the following sequences in the databases: 

>GP:AAD15621 GB:U75480 unknown [Streptococcus mutans] 
Identities = 83/128 (64%) , Positives = 110/128 (85%) 

Query: 6 1 SLMI IALAFVALVI FLI IVLKKVSETIDEAKKTI SVLTSDVNVTLHQTNDILAKANILV 65 
45 I+L+I+A+AF LVI+LI++L+K+S+T+DE+++T+ +LTSDVNVTL+QTN++LAKAN+LV 

Sbjct: 4 IALLIVAIAFAVLVIYLILLLRKISDTVI3ESRQTLKILTSDVNVTLYQTNELLAKANVLV 63 

Query: 66 EDWGKVATIDPLFVAIADLSESLSDIMSQARHEGQKATNATGNVSKAGKLALVGKVASK 125 
EDVNGKV TIDPLF AIADLS S+SDLN QAR+FG+K +T NV KAG GKVASK 
50 Sbjct: 64 EDWGKVETIDPLFTAIADLSVSVSDLNRQARYFGKKTRKSTANVGKAGAAYTFGKVASK 123 

Query: 126 VFGKKGEK 133 

+F KKG++ 
Sbjct: 124 LFRKKGKQ 131 

55 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 92/131 (70%) , Positives = 116/131 (88%) 

Query: 1 MIEIAVLIIAIAFVVLVLGILFVLKKVSETIEETKQTIKVLTSDVNVTLYQTNEILAKAN 60 
60 ++ I+++IIA+AFV LV+ ++ VLKKVSETI+E K+TI VLTSDVNVTL+QTN+ ILAKAN 
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Sbjct: 3 LVGISLMIIALAFVALVIFLIIVLiQOTSETIDEAKKTISVLTSDVNVTLHQTNDILAKftN 62 

Query: 61 VLVDDVNGKVSTIDPLFVA.IADLSESVSDLNLQARHIGQKASSATSSVTKAGSALAIGKA 120 

+LV+DWGKV+TIDPLFVAIADLSES+SDLN QARH GQKA++AT +V+KAG +GK 
Sbjct: 63 ILVEDVNGKVATIDPLFVAIADLSESLSDmSQARHFGQKATOATGNVSKAGKIaALVGKV 122 

Query: 121 ASKIFRKKGDK 131 

ASK+F KKG+K 
Sbjct: 123 ASKVFGKKGEK 133 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 342 

A DNA sequence (GBSx0373) was identified in S.agalactiae <SEQ ID 1109> which encodes the amino 
15 acid sequence <SEQ ID 1 1 1 0>. Analysis of this protein sequence reveals the following: 

Possible site: 13 

>» Seems to have no N-terminal signal sequence 

Final Results 

20 bacterial cytoplasm Certainty=0 . 0462 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Mot Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

25 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 343 

A DNA sequence (GBSx0374) was identified in S.agalactiae <SEQ ID 1111> which encodes the amino 
30 acid sequence <SEQ ID 1 1 12>. This protein is predicted to be prolipoprotein diacylglyceryl transferase (lgt). 
Analysis of this protein sequence reveals the following: 

Possible site: 29 

»•> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -8.39 Transmembrane 231 - 247 ( 225 - 251) 

35 INTEGRAL Likelihood = -7.64 Transmembrane 89 - 105 ( 87 - 107) 

INTEGRAL Likelihood = -5.20 Transmembrane 18 - 34 ( 13 - 36) 

INTEGRAL Likelihood = -1.86 Transmembrane 46 - 62 ( 46 - 64) 

Final Results 

40 bacterial membrane Certainty=0 .4354 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9721> which encodes amino acid sequence <SEQ ID 9722> 
45 was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC80171 GB:U75480 putative prolipoprotein diacylglycerol 
transferase [Streptococcus mutans] (ver 3) 
Identities = 184/257 (71%) , Positives = 226/257 (87%) 



50 



Query: 2 MINPVAIRLGPFSIRWYAICIVSGMLLAVYLAMKEAPRKNIKSDDILDFILMAFPLSIVG 61 
MINP+AI+LGP +IRWY+ICIV+G++LAVYL ++EAP+KNIKSDD+LDFIL+AFPL+IVG 
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Sbjct: 1 MINPIAIKLGPLTIRWYSICIOTGLIIAVYLTIREAPKKNIKSDDVLDFILIAFPIAIVG 60 

Query: 62 ARIYYVIFEWAYYSKHPVEIIAIWNGGIAIYGGLITGAILLVIFSYRRLINPIDFLDIAA 121 

AR+YYVIF+W YY K+P EI IW+GGIAIYGGL+TGA++L IFSY R+I PIDFLD+AA 
Sbjct: 61 ARLYYVIFDWDYYLKNPSEIPVIWHGGIAIYGGLLTGALVLFIFSYYRMIKPIDFLDVAA 120 

Query: 122 PGVMIAQAIGRWGNFINQEAYGRAVKNLlSrWPNFIKNQMyiDGAYRVPTFLYESLWNFLG 181 

PGVM+AQ+IGRWGNF+NQEAYG+ V LNY+P+FI+ QMYIDG YR PTFLYESLWN LG 
Sbjct: 121 PGVMIAQSIGRWGNFWQEAYGKTVTQLNYLPDFIRKQMYIDGHYRTPTFLYESLWNIiLG 180 

Query: 182 FVI IMS IRHRPRTLKQGEVACFYLVWYGCGRFI IEGMRTDSLYLAGLRVSQWLS VILVI I 241 

F+IIM +R RP LK+GEVA FYL+WYG GRF+IEGMRTDSL A LRVSQWLSV+LV++ 
Sbjct: 181 FIIIMILRRRPNLLKEGEVAFFYLIWYGSGRFVIEGMRTDSLMFASLRVSQWLSVLLVW 240 

15 Query: 242 GIVMIIYRRREQHISYY 258 

G+++++ RRR I YY 
Sbjct: 241 GVILMVIRRRNHAIPYY 257 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1113> which encodes the amino acid 
20 sequence <SEQ ID 1 1 14>. Analysis of this protein sequence reveals the following: 

Possible site: 28 
>>> Seems to have an uncleavable N-term signal seq 

25 



50 



INTEGRAL 


Likelihood = 


-7. 


,01 


Transmembrane 


229 


- 245 


( 


222 


- 249) 


INTEGRAL 


Likelihood = 


-6. 


.90 


Transmembrane 


45 


- 61 


( 


40 


- 68) 


INTEGRAL 


Likelihood = 


-4. 


.41 


Transmembrane 


17 


- 33 


( 


11 


- 35) 


INTEGRAL 


Likelihood = 


-4. 


.14 


Transmembrane 


87 


- 103 


( 


86 


- 106) 


INTEGRAL 


Likelihood = 


-0. 


.27 


Transmembrane 


170 


- 186 


( 


170 


- 186) 



Final Results 

30 bacterial membrane — Certainty=0. 3803 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

35 >GP:AAC80171 GB:U75480 putative prolipoprotein diacylglycerol 

transferase [Streptococcus mutans] (ver 3) 
Identities = 176/258 (68%) , Positives = 217/258 (83%) 

Query: 1 MINPIALKCGPLAIHWYALCILSGLVLAVYLASKEAPKKGISSDAIFDFILIAFPLAIVG 60 
40 MINPIA+K GPL I WY++CI++GL+LAVYL +EAPKK I SD + DFILIAFPLAIVG 

Sbjct: 1 MINPIAIKLGPLTIRWYSICIVTGLILAVYLTIREAPKKNIKSDDVLDFILIAFPLAIVG 60 

Query: 61 ARIYYVIFEWSYYVKHLDEIIAIWNGGIAIYGGLITGALVLLAYCYNKVLNPIHFLDIAA 120 
AR+YYVIF+W YY+K+ EI IW+GGIAIYGGL+TGALVL + Y +++ PI FLD+AA 
45 Sbjct: 61 ARLYYVIFDWDYYLKNPSEIPVIWHGGIAIYGGLLTGALVLFIFSYYRMIKPIDFLDVAA 120 

Query: 121 PSVMVAQAIGRWGNFINQEAYGKAVSQLNYLPSFIQKQMFIEGSYRIPTFLYESLWNLLG 180 

P VM+AQ+ IGRWGNF+NQEAYGK V+QLNYLP FI+KQM+I+G YR PTFLYESLWNLLG 
Sbjct: 121 PG VMLAQS IGRWGNFVNQEAYGKTVTQLNYL PD F I RKQMYI DGHYRTPTFLYESLWNLLG 180 



Query: 181 FVIIMMWRRKPKSLLDGEIFAFYLIWYGSGRLVIEGMRTDSLMFLGIRISQYVSALLIII 240 

F+IIM+ RR+P L +GE+ FYLIWYGSGR VI EGMRTDSLMF +R+SQ++S LL+++ 
Sbjct: 181 FIIIMILRRRPNLLKEGEVAFFYLIWYGSGRFVIEGMRTDSLMFASLRVSQWLSVLLVW 240 



55 Query: 241 GLIFVIKRRRQKGISYYQ 258 

G+I ++ RRR I YYQ 
Sbjct: 241 GVILMVIRRRNHAIPYYQ 258 

An alignment of the GAS and GBS proteins is shown below: 

60 Identities = 176/257 (68%) , Positives = 221/257 (85%) 

Query: 2 MINPVAIRLGPFSIRWYAICIVSGMLLAVYLAMKEAPRKNIKSDDILDFILMAFPLSIVG 61 

MINP+A++ GP +1 WYA+CI+SG++LAVYLA KEAP+K I SD I DFIL+AFPL+IVG 
Sbjct:, 1 MINPIALKCGPLAIHWYALCILSGLVLAVYLASKEAPKKGISSDAIFDFILIAFPLAIVG 60 
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Query: 62 ARI YYVT FEWAYYSKHPVEI IAIMNGGIAIYGGLITGAILLVT FSYRRLINPIDFLDIAA 121 
ARIYYVIFEW+YY KH EIIAIWNGGIAIYGGLITGA++L+ + Y +++NPI FLDIAA 

Sbjct: 61 ariywifewsyyvkhldeiiai™ggiaiygglitgalvllaycynkvlnpihfldiaa 120 

5 

Query: 122 PGVMIAQAIGRWGNFINQEAYGRAVKMIJJYVPNFIKNQMYIDGAYRVPTFLYESLWNFLG 181 

P VM+AQAIGRWGNFINQEAYG+AV LNY+P+FI+ QM+I+G+YR+PTFLYESLWN LG 
Sbjct: 121 PSVMVAQAIGRWGNFINQEAYGKAVSQLNYLPSFIQKQMFIEGSYRIPTFLYESLWNLLG 180 

10 Query: 182 FVIIMSIRHRPRTLKQGEVACFYLVWYGCGRFIIEGMRTDSLYLAGLRVSQWLSVILVII 241 

FVIIM R +P++L GE+ FYL+WYG GR +IEGMRTDSL G+R+SQ++S +L+II 
Sbjct: 181 FVI IMMWRRKPKSLLDGEIFAFYLIWYGSGRLVTEGMRTDSLMFLGIRISQYVSALLI I I 240 

Query: 242 GIVMI IYRRREQHISYY 258 
15 G++ +1 RRR++ ISYY 

Sbjct: 241 GL1FVIKRRRQKGISYY 257 

A related GBS gene <SEQ ID 8557> and protein <SEQ ID 8558> were also identified. Analysis of this 
protein sequence reveals the following: 

20 Lipop: Possible site: -1 Crend: 0 

McG: Discrim Score: 2.45 
GvH: Signal Score (-7.5) : -2.9 

Possible site: 39 
»> Seems to have an uncleavable N-term signal seq 
25 ALOM program count: 3 value: -8.39 threshold: 0.0 

INTEGRAL Likelihood = -8.39 Transmembrane 209 - 225 ( 203 - 229) 
INTEGRAL Likelihood = -7.64 Transmembrane 67 - 83 ( 65 - 85) 
INTEGRAL Likelihood = -1.86 Transmembrane 24 - 40 ( 24 - 42) 
PERIPHERAL Likelihood =0.79 92 
30 modified ALOM score: 2.18 

*** Reasoning Step: 3 

Final Results 

35 bacterial membrane Certainty=0 .4354 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

40 ORF01400(238 - 1008 of 1308) 

SP|P72482|LGT_STRMU(1 - 257 of 259) PROLIPOPROTEIN DIACYLGLYCERYL TRANSFERASE (EC 2.4.99.- 
). Gp|4583534|gb|AAC80171.3 | |U75480 putative prolipoprotein diacylglycerol transferase 
{Streptococcus mutans} PIR|T11569|T11569 prolipoprotein diacylglyceryl transferase (EC 
2.4.99.-) - Streptococcus mutans 

45 %Match =46.9 

%Identity =71.6 %Similarity = 89.5 

Matches = 184 Mismatches = 27 Conservative Sub.s = 46 

198 228 258 288 318 348 378 408 

50 WGLMLPRLLRIV*HI*LVRTRSMMINPVAIRLGPFSIRWYAICIVSGMLLAVYLAMKEAPRKNIKSDDILDFILMAFPLS 

lll|:|| = !l|::|ll|:||lbh = lllll = = 1 I I = I I I I I I I : I I I I I = I I I I = 
MINPIAIKLGPLTIRWYSICIVTGLILAVYLTIREAPKKNIKSDDVLDFILIAFPLA 
10 20 30 40 50 

55 438 468 498 528 558 588 618 648 

IVGARIYYVIFEWAYYSKHPVEIIAIVJNGGIAIYGGLITGAILLVIFSYRRLINPIDFLDIAAPGVMIAQAIGRWGNFIN 



60 



IVGARLYWIFDVTOYYLKNPSEIPVIVfflGGIAIYGGLLTGALvLFIFSYYRMIKPIDFLDVAAPGVMLAQSIGRWGNFVN 
70 80 90 100 110 120 130 



678 708 738 768 798 828 858 888 

QEAYGRAVKNLNYVPNFIKNQMYIDGAYRVPTFLYESLWNFLGFVI IMS IRHRPRTLKQGEVACFYLVWYGCGRFI IEGM 
llllh | |||:|:||: |||||! || | | | | | | | | | | : | | | : | | | :| || ||:|||| ||:|| llhllll 
QEAYGKTVTQLNYLPDFIRKQMYIDGHYRTPTFLYESLWNLLGFIIIMIIJ^RP 
65 150 160 170 180 190 200 210 
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918 948 978 1008 1038 1068 1098 1128 

RTDSLYIiAGLRVSQWLSVILVIIGIVMIIYRRREQHISYY*TEEVL**KLLy*LLPLRLLF*F*EYFSP*KKYQKRLRKP 

lllll =1 |||||||||:||::|::::: III : I || 
5 RTDSLMFASLRVSQWLSVLLVWGVILMVIRRRNHAI PYYQC 

230 240 250 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 344 

10 A DNA sequence (GBSx0375) was identified in S.agalactiae <SEQ ID 1115> which encodes the amino 
acid sequence <SEQ ID 1116>. Analysis of this protein sequence reveals the following: 

Possible site: 31 

>>> Seems to have no N-terminal signal sequence 

15 Final Results 

bacterial cytoplasm Certainty=0. 2 817 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

20 The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAA77782 GB:AB027460 Hpr kinase [Streptococcus bovis] 
Identities = 264/309 (85%) , Positives = 292/309 (94%) 

Query: 1 MAVTVQMLVDRLKLNV1YGDEHLLSKRITTADISRPGLEMTGYFDYYAPERLQLVGMKEW 60 
25 M+VTV+MLVD++KL+V1YGD+ LLSK 1TT+D1SRPGLEMTGYFDYY+PERLQL+GMKEW 

Sbjct: 1 MSVTVKMLVDKVKLDVIYGDDDLLSKEITTSDISRPG1EMTGYFDYYSPERLQLLGMKEW 60 

Query: 61 SYLMAMTGHNRYQVLREMFQKETPAIVVARDLEIPEEMYEAAKDTGIAILQSKAPTSRLS 120 
SYL MT HNR VLREM + ETPAI + VAR+L IPEEM AAK+ GIAILQS PTSRLS 
30 Sbjct: 61 SYLTKMTSHNRRHVLREMIKPETPAIIVARNLAIPEEMISAAKEKGIAILQSHVPTSRLS 120 

Query: 121 GEVSWYLDSCLAERTSVHGVLMDIYGMGVLIQGDSGIGKSETGLELVKRGHRLVADDRVD 180 

GE+SKYLDSCLAERTSVHGVLMDIYGMGVLIQGDSGIGKSETGLELVKRGHRLVADDRVD 
Sbjct: 121 GEMSTOLDSCLftERTSVHGVLMDIYGMGVLIQGDSGlGKSETGLELVKRGHRLVADDRVD 180 

Query: 181 VYAKDEETLWGEPAEILRHLLEIRGVGIIDIMSLYGASAVKDSSQVQLAIYLENFETGKV 240 

V+AKDEETLWGEPAEILRHLLEIRGVGIID+MSLYGASAVKDSSQVQLAIYLEN+E+GKV 
Sbjct: 181 VFAKDEETLWGEPAEILRHLLEIRGVGIIDVMSLYGASAVKDSSQVQIiAIYLENYESGKV 240 

40 Query: 241 FDRLGNGNEEIELSGVKVPRIRIPVKTGRNVSWIEAAAMNHRAKQMGFDATQTFEDRLT 300 

FDRLGNGNEE+ELSGVK+PR+RIPV+TGRN+SWIEAAAMN+RAKQMGFDAT+TFE+RLT 
Sbjct: 241 FDRLGNGNEELELSGVKIPRLRIPVQTGRNMSWIEAAAMNYRAKQMGFDATKTFEERLT 300 

Query: 301 HLISQNEVN 309 
45 LI++NE N 

Sbjct: 301 QLITKNEGN 309 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1117> which encodes the amino acid 
sequence <SEQ ID 1 1 1 8>. Analysis of this protein sequence reveals the following: 

50 Possible site: 13 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2391 (Affirmative) < suco 

55 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 255/309 (82%) , Positives = 288/309 (92%) 



35 



WO 02/34771 



PCT/GB01/04789 



5 







-437- 




Query: 


1 


MAVTVQMLVDRLKLNViyGDEHLLSKRITTADlSRPGLEMTGYFDYYAPERLQLVGMKEW 


60 






M VTV+MLV ++KL+V+Y ++LLSK ITT+DISRPGLEMTGYFDYYAPERLQL GMKEW 




Sbjct: 


32 


MTVTVKMLVQKVKIiDVWATDMjLSKEITTSDISRPGLEMTGYFDYYAPERLQLFGMKEW 


91 


Query: 


61 


SYLMAMTGHNRYQVLREMFQKETPAIVVARDLEIPEEMYEAAKDTGIAILQSKAPTSRLS 


120 






SYL MT HNRY VL+EMF+K+TPA+W+R+L IP+EM +AAK+ GI++L S+ TSRL+ 




Sb j ct : 


92 


SYLTQMTSHNRYSVLKEMFKKDTPAWVSRNLAI PKEMVQAAKEEGI SLLSSRVSTSRIA 


151 


Query: 


121 


GEVSWYLDSCLAERTSVHGVLMDIYGMGVLIQGDSGIGKSETGLELVKRGHRLVADDRVD 


180 






GE+S++LD+ LAERTSVHGVLMDIYGMGVLIQGDSGIGKSETGLELVKRGHRLVADDRVD 




Sb j ct : 


152 


GEMSYFLDASLAERTSVHGVLMDIYGMGVLIQGDSGIGKSETGLELVKRGHRLVADDRVD 


211 


Query: 


181 


VYAKDEETLWGEPAEILRHLLEIRGVGIIDIMSLYGASAVKDSSQVQLAIYLENFETGKV 


240 






VYAKDEETLWGEPAEILRHLLEIRGVGIID+MSLYGASAVKDSSQVQIAIYLENFE GKV 




Sbjct: 


212 


VYAKDEETLWGEPAEILRHLLEIRGVGIIDVMSLYGASAVKDSSQVQLAIYLENFEAGKV 


271 


Query: 


241 


FDRLGNGNEEIELSGVKVPR1RIPVKTGROTSWIEAAAMNHRAKQMGFDATQTFEDRLT 


300 






FDRLGNGNEE I SGV++PRIRIPVKTGRNVSWIEAAAMNHRAK+MGFDAT+TFEDRLT 




Sb j ct : 


272 


FDRLGNGISIEEITFSGTOIPRIRIPVKTGRNVSWIEAAAMNHRAKEMGFDATKTFEDRLT 


331 


Query: 


301 


HLISQNEVN 309 








LI++NEV+ 




Sbjct: 


332 


QLITKNEVS 340 





25 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 345 

A DNA sequence (GBSx0376) was identified in S.agalactiae <SEQ ID 1119> which encodes the amino 
30 acid sequence <SEQ ID 1 120>. Analysis of this protein sequence reveals the following: 
Possible site: 28 

»> Seems to have no N-terminal signal sequence 

Final Results 

35 bacterial cytoplasm Certainty=0 . 1836 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9719> which encodes amino acid sequence <SEQ ID 9720> 
40 was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

45 Example 346 

A DNA sequence (GBSx0377) was identified in S.agalactiae <SEQ ID 1121> which encodes the amino 
acid sequence <SEQ ID 1 122>. Analysis of this protein sequence reveals the following: 

Possible site: 37 

»> Seems to have an uncleavable N-term signal seq 
50 INTEGRAL Likelihood = -4.88 Transmembrane 35 - 51 ( 31 - 59) 

Final Results 

bacterial membrane Certainty=0. 2954 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 
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bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC67275 GB:AF017113 YvlC [Bacillus subtilis] 
Identities = 21/63 (33%) , Positives = 36/63 (56%) , Gaps = 2/63 (3%) 

Query: 3 SSFYKQRKGKLVCGWAGIMKYNWDLM.SRVLIALILYFTKF--GLLLYILLAVFLPYK 60 

+ Y+ K K + GV+ GLA+ +NWD +L RV+ ++ T LL+YI+ +P + 

Sbjct: 2 NKLYRSEKNKKIAGVIGGLAEYFNWDASLLRVITVIIAIMTSVLPVLLIYIIWIFIVPSE 61 

Query: 61 EDI 63 
D+ 

Sbjct: 62 RDM 64 



15 A related DNA sequence was identified in S. pyogenes <SEQ ID 1123> which encodes the amino acid 
sequence <SEQ ID 1124>. Analysis of this protein sequence reveals the following: 



Possible site: 32 

>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -5.26 Transmembrane 39 - 55 ( 31 - 61) 



Final Results 

bacterial membrane Certainty=0 .3102 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 60/90 (66%), Positives = 77/90 (84%), Gaps = 3/90 (3%) 

Query: 1 MKSSFYKQRKGKLVCGWAGIADKYNWDLALSRVLIALILYFTKFGLLLYILLAVFLPYK 60 
30 +++ FYKQRK +LV GV+AGLADKY WDLAL+RVL AL++Y T FG+LLYILLA+FLPYK 

Sbjct: 1 WTKFYKQRKNRLVAGVIAGLADKYGWDLALARWjAALLIYGTGFGVLLYILLAIFLPYK 60 

Query: 61 EDIIETR-RQGPRRRKDAEPV--DDDGWFW 87 
ED++E R +GPRRRKDA+ + ++DGWFW 
35 Sbjct: 61 EDLLEERYGRGPRRRKDADVLNEEEDGWFW 90 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 347 

40 A DNA sequence (GBSx0378) was identified in S.agalactiae <SEQ ID 1125> which encodes the amino 
acid sequence <SEQ ID 1126>. Analysis of this protein sequence reveals the following: 

Possible site: 19 

>>> Seems to have no N-terminal signal sequence 

45 Final Results 

bacterial cytoplasm Certainty=0. 3 577 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

50 A related GBS nucleic acid sequence <SEQ ID.9717> which encodes amino acid sequence <SEQ ID 9718> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 



55 



>GP:BAB04250 GB:AP001508 unknown conserved protein [Bacillus halodurans] 
Identities = 379/729 (51%) , Positives = 515/729 (69%) , Gaps = 25/729 (3%) 

Query: 29 ENLNITQIAIDLGIKASQIEKVLELTDEGNTIPFIARYRKEMTGNLDEVQIKSIIDLDKS 88 
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E I +A +L +K + I++V++L EGNT+PFIARYRKE+TG +DEV+I+ + + 
Sbjct: 8 EEHTIKTIAKELSLKPN^IKQVIQLLHEGNTVPFIIUIYRKELTGGMDEVKIREVSEKWTY 67 

Query: 89 MTALSDRKTTVLAKIEEQGKLTQELK3CAIEFATKIADVEELYLPYKEKRRTKATIAREAG 148 
5 L +RK V+ +EEQGKLT E KK +E+A KL +VE+LY PYK+KRRT+AT+A+E G 

Sbjct: 68 ANQLHERKEEVIRLVEEQGK1TDEWKKTVEQAQKLQEVEDLYRPYKQKRRTRATVAKEKG 127 

Query: 149 LFPLARLI--LQNKDNLEEEAQNYLTDGFETTT--KALSGAVDILIEAFSEDNKLRSWTY 204 
L PLA + L + +EA+ YL+ E T L GA DI+ E ++D LR 

10 Sbjct: 128 LEPIAEWLFSLPRDGDPLQEAEVYLSTOHELTKVEDVLQGAQDIIAEWIADDADLRKRIR 187 

Query: 205 NEIWNYSSITAWKDESLDEKQVFKIYYDFSEKISKLHGYQVIALNRGEKMGVLKVNFEH 264 

+ + S+ A VK E LDEK V+++YYD+ E + L ++ LAUJRGEK VL+V 
Sbjct: 188 SLGFKEGSVIAKVKKEELDEKGVYEMYYDYEEPVRTLVPHRTIALNRGEKEDVLRVTIRF 247 

15 

Query: 265 NLEKMFRF FAVRFKETS-QYIDDLIVQTVKKKIVPAMERRIRTELSEGAEDGAISL 319 

++++ F RF + Y+ I K+ I P++ER IR EL+E AE+ AI + 

Sbjct: 248 PVDRI IEMSEKTFIRRFGSPAVPYVKAAIEDGYKRLIEPSIEREIRHELTEKAEEQAIHI 307 

20 Query: 320 FSENLRTILLLVSPLKGKMVLGFDPAFRTGAKLAVVDQTGKLMTTQVIYPVPPANQAKIEQ 379 

F+ENLR+LLL P+KGK+VLG DPA+RTG KLA+VD+TGK++ QVIYP PP N+ + 
Sbjct: 308 FAENLRSLLLQPPIKGKVVLGLDPAYRTGCKLAIVDETGKVLDIQVIYPTPPKNE- -VAA 365 

Query: 380 SKIEIAKLIKEFNIEIIAIGNGTASRESEAFVAEVLQDFPD-VSYVIVNESGASVYSASE 438 
25 +K + KLI ++ +E+IAIGNGTASRESE F+A++++D P + Y+I VNE+GASVYSASE 

Sbjct: 366 AKKIVKKLIADYGVEMIAIGNGTASRESEQFIADLIKDLPQTIYYLIVNEAGASVYSASE 425 

Query: 439 IARHEFPDLTVEKRSAISIARRLQDPIAELVKIDPKSIGVGQYQHDVSQKKLAENLDFW 498 
+ R EFPDL VE+RSA+SIARRLQDPLAELVKIDPKS+GVGQYQHDVSQK+L E+L FW 
30 Sbjct: 426 IGREEFPDLQVEERSAVSIARRLQDPLAELVKIDPKSVGVGQYQHDVSQKRLNESLTFW 485 

Query: 499 ETVVNQVGVNVOTASPALLAHVSGMKTISENIVKYl^ENGQIKSRAEIKKV 558 

ETVVNQVGVNVNTAS P+LL +V+GL+KT+++NIVK REE G+ +RA++K +PRLGAK + 
Sbjct: 486 ETVVNQVGVNVOTASPSLLQYVAGLSKWAKNIVKKREEAGRFTARRQLKDIPRLGAKTY 545 

35 

Query: 559 EQAAGFLRIPNAKNFLDNTGVHPESYEAVKKLLDQLTIKELD DLAKEKLQNLDLIAT 615 

EQ GFLRI + N LD T +HPESY+ KLL ++ D + K+KLQ LD+ A 

Sbjct: 546 EQCIGFLRIMDGDNLLDATAIHPESYKVTDKLLSEVGATAADVGIEDLKKKLQALDVSAM 605 

40 Query: 616 AESIGVGQETLKDIIEDLLKPGRDLRDDFFAPVLRHDVLDVSDLKVGQELQGTVRNVVDF 675 

A ++ VG TLKD+I+ L++P RD RD+ P+L+ DVL + DL G ELQGTVRNWDF 
Sb j Ct : 606 AATLDVGVPTLKDMIDALIRPTRDPRDEVAKPLLKQDVLQLEDLLPGMELQGTVRNWDF 665 

Query: 676 GAFVDIGVHEDGLIHQSRLIKRKRDKKTRKMPPLQHPSKYLSVGDIVTVWVVEVDAERSR 735 
45 G FVDIGV +DGL+H S+L R ++HP + ++VG+IVTVW +VD ++ R 

Sbjct: 666 GVFVDIGVKQDGLVHISKLANRY IKHPLEWTVGEIVTVWVEDVDIKKGR 715 

Query: 736 IGLSLIKPD 744 
I L++++P+ 
50 Sbjct: 716 IALTMLRPE 724 

A related DNA sequence was identified in S. pyogenes <SEQ ID 1127> which encodes the amino acid 
sequence <SEQ ID 1 128>. Analysis of this protein sequence reveals the following: 

Possible site: 25 
55 >» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2207 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

60 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 532/716 (74%) , Positives = 619/716 (86%) , Gaps = 10/716 (1%) 
65 Query: 28 MENLNITQIAIDLGIKASQIEKVLELTDEGNTIPFIARYRKEMTGNLDEVQIKSIIDLDK 87 
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MEN N IA L + QIE+VL LT +GNTIPFIARYRKE+TGNLDEV IKSIID+DK 
Sbjct: 1 MENNNNHNIAEALSVSLHQIEQVLALTACjGNTIPFIARYRK^ 60 

Query: 88 SMTALSDRKTTVIAKIEEQGKLTQELKKAIEEATKLftDVEELYLPYKEKRRTKATIAREA 147 
5 S+T L++RK T+LAKIEEQGKLT +L+ +IE KIAD+EELYLPYKEKRRTKATIAREA 

Sbjct: 61 SLTTLNERKATILAKIEEQGKLTDQLRTSIEATEKLftDLEELYLPYKEKRRTKATIAREA 120 

Query: 148 GLFPIARLILQNKDNLEEEAQNYLTDGFETTTKftLSGAVDILIEAFSEDNKLRSWTYNEI 207 
GLFPLARLILQN NLE A+ ++T+GF + +AL+GAVDIL+EA SED KLRSWTYNEI 
10 Sbjct: 121 GLFPIARLILQNAQNLETAAEPFVTEGFASPQEAIAGATOILVEAMSEDAKLRSWTYNEI 180 

Query: 208 WNYSSITAVVKDESLDEKQVFKIYYDFSEKISKLHGYQVIAIJ^GEKMGVLKVNFEHNLE 267 

W YS + + +KDE LDEK+VF+IYYDFS+++S + GY+ LALNRGEK+G+LKV+FEHNLE 
Sbjct: 181 WQYSRLVSTLKDEQLDEKKVFQIYYDFSDQVSNMQGYRTIAIiNRGEKLGILKVSFEHNLE 240 

15 

Query: 268 KMFRFFAVRFKETSQYIDDLIVQTVKKKIVPAMERRIRTELSEGAEDGAISLFSENLRNL 327 

KM RFF+VRFKET+ YI+++I QT+KKKIVPAMERR+R+ELS+ AEDGAI LFSENLR+L 
Sbjct: 241 KMQRFFSVRFKETNPYIEEVINQTIKKKIVPAMERRVRSELSDAAEDGAIHLFSENLRHIi 300 

20 Query: 328 LLVSPLKGKMVLGFDPAFRTGAKIAVVDQTGKLMTTQVIYPVPPANQAKIEQSKIELAKL 387 

LLVSPLKGKMVLGFDPAFRTGAKLA+VDQTGKL+TTQVIYPV PA+Q KI+ +K L +L 
Sbjct: 301 LLVSPLKGKMVLGFDPAFRTGAKIAIVDQTGKLLTTQVIYPVAPASQTKIQAAKETLTQL 360 

Query: 388 IKEFNIEIIAIGNGTASRESEAFVAEVLQDFPDVSYVIVNESGASVYSASELARHEFPDL 447 
25 1+ + I + 1 IAIGNGTASRESEAFVA+VL+DFP+ SYVI VNESGASVYSASEIARHEFPDL 

Sbjct: 361 IETYQIDIIAIGNGTASRESEAFVADVLKDFPNTSYVTVNESGASVYSASELARHEFPDL 420 

Query: 448 TVEKRSAISIARRLQDPLAELWIDPKSIGVGQYQHDVSQKK1AENLDFVVETVVNQVGV 507 
TVEKRSAISIARRLQDPLAELVK1DPKSIGVGQYQHDVSQKKL+ENL FW+TWNQVGV 
30 Sbjct: 421 TVEKRSAISIARRLQDP1AELVK1DPKSIGVGQYQHOTSQKKLSENLGFVVDTVVNQVGV 480 

Query: 508 NVOTASPALLAHVSGliNKTISENIVKYREENGQIKSRAEIKKVPRLGAKAFEQAAGFLRI 567 

NVNTASP+LIAHVSGLNKTI SENIVKYREENG + SRA+IKKVPRLGAKAFEQAAGFLRI 
Sbjct: 481 NVOTASPSLLAHVSGLNKTISENIVKYREE^^ 540 

35 

Query: 568 PNAKNFLDIWGVHPESYEAVKKLLDQLTIKELDDLAKEKLQNLDLIATAESIGVGQETLK 627 

P AKN LDNTGVHPESY AVK+L L I++LDD AK L + + AE++ +GQETLK 
Sbjct: 541 PGAKNILDlWGVHPESYPAVKELFKVLGIQDLDDAAKATLflAVQVPQMAETLAIGQETLK 600 

40 Query: 628 DIIEDLLKPGRDLRDDFEAPVLRHDVLDVSDLKVGQELQGTVRNVVDFGAFVDIGVHEDG 687 

DII DLLKPGRDLRDDFEAP+LR D+LD+ DL++GQ+L+GTVRNWDFGAFVDIGVHEDG 
Sbjct: 601 DIIADLLKPGRDLRDDFEAPILRQDILDLKDLEIGQKLEGTVRNVVDFGAFVDIGVHEDG 660 

Query: 688 LIHQSRLIKRKRDKKTRKMPPLQHPSKYLSVGDIVTVWVVEVDAERSRIGLSLIKP 743 
45 LIH S + K + HPS+ +SVGD+VTVWV ++D +R ++ LSL+ P 

Sbjct: 661 LIHISEMSKTF VNHPSQWSVGDIiVTVWVSKIDLDRHKVNLSLLPP 706 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

50 Example 348 

A DNA sequence (GBSx0379) was identified in S.agalactiae <SEQ ID 1129> which encodes the amino 
acid sequence <SEQ ID 1130>. This protein is predicted to be N5,N10-methylenetetrahydromethanopterin 
reductase homolog. Analysis of this protein sequence reveals the following: 

Possible site: 60 
55 >» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty^O. 4864 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

60 bacterial outside Certainty^O. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database: 
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>GP:AAB94650 GB:U96107 N5,N10-methylenetetrahydromethanopterin 
reductase homolog [Staphylococcus carnosus] 
Identities = 164/300 (54%) , Positives = 217/300 (71%) , Gaps = 1/300 (0%) 

5 Query: 45 VYGIGEHHREDFAVSAPEIVLAAGAVRTNNIRLSSAVTILSSNDPIRVYQQFSTIDALSN 104 

+YG+GEHHR D+AVS P VIAA A T I+LSSAVT+LSS+DP+ VY++F+T+DA+SN 
Sbjct: 1 MYGLGEHHRSDYAVSDPVTVLAAARSLTQRIKLSSAVTVLSSDDPVCVYERFATLDAVSN 60 

Query: 105 GRAEIMAGRGSFIESFPLFGYDIADYX1DLFNEKMDMLIAINSATNLDWKGHLTQTVNERP 164 
10 GRAEIM GRGSFIESFPLFGYDL DYD LF EK+++L IN + W+G + + 

Sbjct: 61 GRAEIMVGRGSFIESFPLFGYDLDDYDRLFVEKLELLKEINQHEWTWEGTMRPAIKGLG 120 

Query: 165 IYPRALQRQLPIWVATGGfJVDSTIRIAEQGLPIVYATIGGNPKAFRQLVHIYKEVGSRNG 224 
+YPRA+Q ++PIW+ATGG +S+IR AE GLPI YA IGGNPK F++ + IY+ V G 
15 Sbjct: 121 VYPRAVQDEIPIWLATGGTPESSIRAAEFGLPITYAIIGGNPKRFKRNIAIYRAVAESRG 180 

Query: 225 HKPEQLKVAAHSWGWIEEDNQTAIDRYFFPTKQTVDNIAKGRPHWSEMTKEQYLRSVGPE 284 

+ + VA HSWG+I + ++ A ++ PTK + IAK R +W T+ + R + E 
Sbjct: 181 YDLADMPVAVHSWGYIADTDEQAQREFYEPTKVHHEIIAKER-NWPPYTEAHFQREISDE 239 

20 

Query: 285 GAIFVGSPEWAHKIIGLVEALELDRFMLHLPVGSMPHKDVLNAIKLYGKEVAPIVRKYF 344 

GA+FVGSPE VA K+I ++E L L+RFMLH+PVGSMPH+ ++ AIKLYGK V PI+ YF 
Sbjct: 240 GAMFVGS PETVARKMI KVIEELGLNRFMLH I PVGSMPHERIMKAI KLYGKRVKPI IEDYF 299 

25 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 349 

A DNA sequence (GBSx0380) was identified in S.agalactiae <SEQ ID 1131> which encodes the amino 
30 acid sequence <SEQ ID 1 132>. Analysis of this protein sequence reveals the following: 

Possible site: 22 

>>> Seems to have no N-terrainal signal sequence 

Final Results 

35 bacterial cytoplasm Certainty=0 . 1310 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9715> which encodes amino acid sequence <SEQ ID 9716> 
40 was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1133> which encodes the amino acid 
sequence <SEQ ID 1 134>. Analysis of this protein sequence reveals the following: 

Possible site: 25 
45 >» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 0915 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

50 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 20/40 (50%) , Positives = 27/40 (67%) , Gaps = 3/40 (7%) 



55 Query: 4 MAITHKRQDDLESMFASFAKVP KPKKVDSDSKPEQKD 40 

MAITHK+ D+LE M A FA +P KP +V++D K K+ 
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Sbjct: 1 MAITHKKNDELEKMLAGFASIPSFDKPLEVNTDGKLATKE 40 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 350 

A DNA sequence (GBSx0381) was identified in S.agalactiae <SEQ ID 1135> which encodes the amino 
acid sequence <SEQ ID 1 136>. Analysis of this protein sequence reveals the following: 

Possible site: 56 

»> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 1453 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

20 Example 351 

A DNA sequence (GBSx0382) was identified in S.agalactiae <SEQ ID 1137> which encodes the amino 
acid sequence <SEQ ID 1 138>. Analysis of this protein sequence reveals the following: 



Possible site: 37 

»> Seems to have an uncleavable N-term signal seq 
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Final Results 

bacterial membrane Certainty=0 . 5458 (Affirmative) < suco 

35 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB12119 GB:Z99105 ycgR [Bacillus subtilis] 
40 Identities = 141/283 (49%), Positives = 198/283 (69%), Gaps = 3/283 (1%) 

Query: 10 SVLQWFAIFISIIIEALPFVLLGTILSGIIEVFITPDIVNKFLPKNKFLRVLFGTFVGFV 69 

S LQ +IFISI+IEA+PF+L+G ILSGII++F++ +++ + +PKN+FL VLFG G + 
Sbjct: 6 SFLQLNSIFISILIEAIPFILIGVILSGIIQMFVSEEMIARIMPKNRFIAVLFGALAGVL 65 

45 

Query: 70 FPSCECGIIPIINRFLEKKVPSYTAVPFLATAPIINPIVLFATYSAFGNSIRFLILRFVG 129 

FP+CECGIIPI R L K VP + V F+ TAPIINPIVLF+TY AFGN + R 
Sbjct: 66 FPACECGIIPITRRLLLKGVPLHAGVAFMLTAPIINPIVLFSTYIAFGNRWSWFYRGGL 125 

50 Query: 130 ATIVAIALGVMLAFLVDDNILKEDAKPTHFHDYSDKKWYQKIFLALAHAIDEFFDTGRYL 189 

A V++ +GV+L++ DN L + +P H H + QK+ L HAIDEFF G+YL 

Sbjct: 126 ALAVSLIIGVILSYQFKDNQLLKPDEPGHHHHHHGTL-LQKLGGTLRHAIDEFFSVGKYL 184 

Query: 190 VFGTLIASAMQIYLPTRVLTTIGHSPITAILVMMLLAFILSLCSEADAFIGASLLSTFGI 249 
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+ G IA+AMQ Y+ T L IG + +++ LVMM LAF+LSLCSE DAFI +S STF + 
Sbjct: 185 IIGAFIAAAMQTWKTSTLLAIGQNDVSSSLVMMGIAFVLSLCSEVDAFIASSFSSTFSL 244 

Query: 250 APVMAFLLIGPMIDIKNLMMMVNSFKTRFIVQFISVSSLIIII 292 

++AFL+ G M+DIKNL+MM+ +FK RF+ F+ ++ 
Sbjct: 245 GSLIAFLVFGAMVDIKNLLMMLAAFKKRFV--FLLITYIWIV 285 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1139> which encodes the amino acid 

sequence <SEQ ID 1 140>. Analysis of this protein sequence reveals the following: 

Possible site: 25 
»> Seems to have an uncleavable N-term signal seq 
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Final Results 

bacterial membrane Certainty=0. 4970 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

>GP:CAB12119 GB:Z99105 ycgR [Bacillus subtllis] 
Identities = 143/288 (49%), Positives = 196/288 (67%), Gaps = 1/288 (0%) 

Query: 10 SVLQWFAI FMS 1 1 IEALPFVLIjGTILSGCIEVFVTPELVQKXLPEQKCIjRIIiFGTFVGFV 69 

S LQ +IF+SI+IEA+PF+L+G ILSG I++FV+ E++ + +PK + L +LFG G + 
Sbjct: 6 SFLQINSIFISILIEAIPFILIGVILSGIIQMFVSEEMIARIMPKNRFIAVLFGALAGVL 65 

Query: 70 FPSCECGIIPIINRFLEKKVPSYTAVPFLATAPIINPIVLFATYSAFGNSLRFLILRLVG 129 

FP+CECGIIPI R L K VP + V F+ TAPIINPIVLF+TY AFGN + R 
Sbjct: 66 FPACECGIIPITRRLLLKGVPLHAGVAFMLTAPIINPIVLFSTYIAFGNRWSWFYRGGL 125 

Query: 130 ARLVAITLGVMLAFIVDDNILKDNAQPVHFHDYSHESLPKRIYLALVHAIDEFFDTGRYL 189 

A V++ +GV+L++ DN L +P H H + H +L +++ L HAIDEFF G+YL 
Sbjct: 126 ALAVSLIIGVILSYQFKDNQLLKPDEPGH-HHHHHGTLLQKLGGTLRHAIDEFFSVGKYL 184 

Query: 190 VFGTLIASAMQIYVPTRVLTTIGHNPLTAILIMMLMAFILSLCSEADAFIGASLLSTFGV 249 

+ G IA+AMQ YV T L IG N +++ L+MM +AF+LSLCSE DAFI +S STF + 
Sbjct: 185 IIGAFIAAAMQTYVKTSTLLAIGQNDVSSSLVMMGLAFVLSLCSEVDAFIASSFSSTFSL 244 

Query: 250 APVLAFLLIGPMVDIKNLMMMVKAFKGRFIVQFIGVSVLMIAVYCLLV 297 

++AFL+ G MVDIKNL+MM+ AFK RF+ I V+++ LLV 
Sbjct: 245 GSLIAFLVFGAMVDIKNLLMMLAAFKKRFVFLLITYIWIVLAGSLLV 292 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 248/300 (82%) , Positives = 278/300 (92%) 

Query: 1 MDIFNQLPDSVLQWFAIFISIIIEALPFVLLGTILSGIIEVFITPDIVNKFLPKNKFLRV 60 

M +F+ LP SVLQWFAIF+SIIIEALPFVLLGTILSG IEVF+TP++V K+LPK K LR+ 
Sbjct: 1 MSLFSNLPPSVLQWFAIFMSIIIEALPFVLLGTILSGCIEVFVTPELVQKYLPKQKCLRI 60 

Query: 61 LFGTFVGFVFPSCECGIIPIINRFLEKKVPSYTAVPFLATAPIINPIVLFATYSAFGNSI 120 

LFGTFVGFVFPSCECGI I P I INRFLEKKVPSYTAVPFLATAP I INPI VLFATYSAFGNS + 
Sbjct: 61 LFGTFVGFVFPSCECGI IPI INRFLEKKVPSYTAVPFLATAPIINPIVLFATYSAFGNSL 120 

Query: 121 RFLILRFVGATIVAIALGVMIAFLVDDNILKEDAKPTHFHDYSDK1(WYQKIFLALAHAID 180 

RFLILR VGA +VAI LGVMLAF+VDDNILK++A+P HFHDYS + ++I+LAL HAID 
Sbjct: 121 RFLILRLVGAALVAITLGVMLAFIVDDNILKDNAQPVHFHDYSHESLPKRIYLALVHAID 180 



Query: 181 EFFDTGRYLVFGTLIASAMQIYLPTRVLTTIGHSPITAILVMMLLAFILSLCSEADAFIG 240 
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EFFDTGRYLVFGTLIASAMQIY+PTRVLTTIGH+P+TAIL+MML+AFILSLCSEADAFIG 
Sbjct: 181 EFFDTGRYLVFGTLIASAMQIYVPTRVLTTIGHNPLTAILIMMLMAFILSLCSEADAFIG 240 

Query: 241 ASLLSTFGIAPVMAFLLIGPMIDIKNLMMMVNSFKTRFIVQFISVSSLI 1 1 IYCLFVGVI 300 
5 ASLLSTFG+APV+AFLLIGPM+DIKNLMMMV +FK RFIVQFI VS L+I +YCL VGV+ 

Sbjct: 241 ASLLSTFGVAPVIAFLLIGP^WDIKNLMMMVKAFKGRFIVQFIGVSVLMIAVYCLIlVGVL 300 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

10 Example 352 

A DNA sequence (GBSx0383) was identified in S.agalactiae <SEQ ID 1141> which encodes the amino 
acid sequence <SEQ ID 1142>. Analysis of this protein sequence reveals the following: 



15 



20 



Possible site: 13 

>>> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 .4703 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

25 Example 353 

A DNA sequence (GBSx0384) was identified in S.agalactiae <SEQ ID 1143> which encodes the amino 
acid sequence <SEQ ID 1144>. Analysis of this protein sequence reveals the following: 

Possible site: 50 

>>> Seems to have an uncleavable N-term signal seq 
30 INTEGRAL Likelihood = -8.44 Transmembrane 45 - 61 ( 39 - 65) 

INTEGRAL Likelihood = -8.12 Transmembrane 83 - 99 ( 77 - 101) 
INTEGRAL Likelihood = -0.00 Transmembrane 2 - 18 ( 1-19) 

Final Results 

35 bacterial membrane Certainty=0. 4376 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 8559> which encodes amino acid sequence <SEQ ID 8560> 
40 was also identified. Analysis of this protein sequence reveals the following: 

Lipop Possible site: -1 Crend: 2 
SRCFLG: 0 

McG: Length of IK: 8 

Peak Value of DR: 2.23 
45 Net Charge of CR: 1 

McG: Discrim Score: 0.46 
GvH: Signal Score (-7.5): -3.54 

Possible site: 42 
>>> Seems to have an uncleavable N-term signal seq 
50 Amino Acid Composition: calculated from 1 

ALOM program count: 2 value: -8.44 threshold: 0.0 

INTEGRAL Likelihood = -8.44 Transmembrane 37 - 53 ( 31 - 57) 
INTEGRAL Likelihood = -8.12 Transmembrane 75 - 91 ( 69 - 93) 
PERIPHERAL Likelihood = 2.76 200 
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10 



modified ALOM score: 2.19 
icml HYPID: 7 CFP: 0.438 

*** Reasoning Step: 3 



Final Results 

bacterial membrane Certainty=0. 4376 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB12118 GB:Z99105 ycgQ [Bacillus subtilis] 
Identities = 100/290 (34%) , Positives = 159/290 (54%) , Gaps = 25/290 (8%) 

15 Query: 9 MIRFLIIAGYFELS^WLKLSGKLNQYINTHYTyIAYIS^lvllSFILAIVQLIIWKNMKMH 68 

M R L+L G+ +L SG L +YIN Y YL++I++ L IL VQ +++K+ + 

Sbjct: 1 MFRLLVLMGFTFFFYHLHASGNLTKYINMKYAYLSFIAIFLLAILTAVQAYLFIKSPEKS 60 

Query: 69 SHLHGKIA KSTSP MILVFPVLVGLLVPTVSLDSTTVSAKGYN 110 

20 H H + P ++ +FP++ G+ P +LDS+ V KG++ 

Sbjct: 61 GHHHDHDCGCGHDHEHDHEQNKPFYQRYLIYWFLFPLVSGIFFPIATLDSSIVKTKGFS 120 

Query: 111 FPLAAGSTGTVSQDGTRVQYLKPDTSTYFTSSAYEKEMQKELKKYKGSGTLTITTENYME 170 
FAS SQ QYL+PD S Y+ +Y+K+M++ KY +++T +++++ 

25 Sbjct: 121 FK-AMESGDHYSQ TQYLRPDASLYYAQDSYDKQMKQLFNKYSSKKEISLTDDDFLK 175 

Query: 171 vMELIYLYPEQFMDRQIQYTGFVY-NEPKHEGYQFIFRFGIIHCIADSGVYGLLTT-GNQ 228 

ME IY YP +F+ R I++ GF Y ++ F+ RFGIIHCIADSGVYG+L 

Sbjct: 176 GMETIYNYPGEFLGRTIEFHGFAYKGNAINKNQLFVLRFGIIHCIADSGVYGMLVEFPKD 235 

30 

Query: 229 KS YPDNTWVTVRGT I KSEYNQLLQQNLPVLHIEESRQVSKANNPYVYRVF 278 

D+ W+ ++GT+ SEY Q + LPV+ + + + K ++PYVYR F 
Sbjct: 236 MDIKDDEWIHIKGTIASEYYQPFKSTLPVVKVTDWNTIKKPDDPYVYRGF 285 

35 A related DNA sequence was identified in S.pyogenes <SEQ ID 1145> which encodes the amino acid 
sequence <SEQ ID 1 146>. Analysis of this protein sequence reveals the following: 

Possible site: 60 

>» Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -8.33 Transmembrane 83 - 99 ( 74 - 101) 
40 INTEGRAL Likelihood = -6.21 Transmembrane 42 - 58 ( 39 - 62) 

Final Results 

bacterial membrane Certainty=0. 4333 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

45 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related sequence was also identified in GAS <SEQ ID 91 15> which encodes the amino acid sequence 
<SEQ ID 91 16>. Analysis of this protein sequence reveals the following: 

Possible cleavage site: 54 
50 >>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -8.33 Transmembrane 75 - 91 ( 66 - 93) 
INTEGRAL Likelihood = -6.21 Transmembrane 34 - 50 ( 31 - 54) 
PERIPHERAL Likelihood = 2.76 

55 Final Results 

bacterial membrane Certainty= 0 .433 (Affirmative) < suco 

bacterial outside Certainty= 0.000 (Not Clear) < suco 

bacterial cytoplasm Certainty= 0 . 000 (Not Clear) < suco 



60 An alignment of the GAS and GBS proteins is shown below: 

Identities = 208/279 (74%) , Positives = 244/279 (86%) , Gaps = 1/279 (0%) 
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Query: 1 MFICGGNIMIRFLILAGYFELSICfLKLSGKIiNQyiNTHYTYIiAYISM^SFIIAIVQLII 60 

+F CGG +MIRFLILA.GYFEI1+MYL+I1SGKI1+QYIN Y+YLAYISM+LSFIIA+VQL 
Sbjct: 1 LFTCGGALMIRFLILAGYFELTMYLQLSGKLDQYINVRYSYIAYISMILSFILALVQLYT 60 

5 Query: 61 WIOSIMKMHSHLHGKIAKSTSPMILVFPVLVGLLVPTVSLDSTTVSAKGYNFPLAAGSTGT 120 

W+KN+K+HSHL GKIA+ TSP ILVFPVL+GLLVPTV+LDSTTVSAKGY FPLAAG++ T 
Sbjct: 61 VMKNIKVHSHLTGKIARLTSPFILVFPVLIGLLVPTVTLDSTTVSAKGYTFPLAAGASKT 120 

Query: 121 -VSQDGTRVQYLKPDTSTYFTSSAYEKEMQKELKKYKGSGTLTITTENYMEViyiELIYLYP 179 
10 VS DGT +QYLKPDTS YFT SAY+KEM++EL KYKG +TITTENYMEVMELIYLYP 

Sbjct: 121 GVSDDGTTIQYLKPDTSLYFTKSAYQKEMRQELHKYKGKKPVTITTENYMEVMELIYLYP 180 

Query: 180 EQFMDRQIQYTGFVYNEPKHEGYQFIFRFGIIHCIADSGVYGLLTTGNQKSYPDNTWVTV 239 
++F+DR IQYTGFVYNEP H+ YQF+FRFGIIHCIADSGVYGLLTTGNQ SYP+NTW+TV 
15 Sbjct: 181 DEFLDRDIQYTGFVYNEPGHDNYQFLFRFGIIHCIADSGVYGLLTTGNQTSYPNNTWLTV 240 

Query: 240 RGTIKSEYNQLLQQNLPVLHIEESRQVSKANNPYVYRVF 278 

+G + EY++ L+Q+LPVL + E Q + NNPYVYRVF 
Sbjct: 241 KGRLHMEYDKNLEQHLPVLQLAEVHQTKEPNNPYVYRVF 279 

20 

SEQ ID 8560 (GBS235d) was expressed in E.coli as a GST-fiision product. SDS-PAGE analysis of total 
cell extract is shown in Figure 146 (lane 14 & 15; MW 48.5kDa). It was also expressed in E.coli as a His- 
fusion product. SDS-PAGE analysis of total cell extract is shown in Figure 146 (lane 17 & 18; MW 
23.4kDa), in Figure 150 (lane 15; MW 23kDa) and in Figure 182 (lane 5; MW 23kDa). 

25 GBS235d-His was purified as shown in Figure 235, lane 6-7. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 354 

A DNA sequence (GBSx0385) was identified in S.agalactiae <SEQ ID 1147> which encodes the amino 
30 acid sequence <SEQ ID 1 148>. This protein is predicted to be signal recognition particle (ftsY). Analysis of 
this protein sequence reveals the following: 

Possible site: 57 

>>> Seems to have no N- terminal signal sequence 

35 Final Results 

bacterial cytoplasm Certainty=0 . 3301 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

40 The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB06205 GB:AP001515 signal recognition particle (docking 
protein) [Bacillus halodurans] 
Identities = 175/304 (57%) , Positives = 227/304 (74%) 

45 Query: 233 EKYNRSLKKTRTGFSARIiNAFLSNFRRVDEEFFEELEEMLILSDVGVNVATQLTEDLRYE 292 

EK+ L+KTR F+ ++N + +R VDE+FFEELEE+LI +DVGV L E+L+ E 

Sbjct: 20 EKFKAGLEKTRDSFAGKMNDLvYKYRSvDEDFFEELEEILIGADVGVTTvMDLvEELKDE 79 

Query: 293 AKLENAKKSEDLKRVIVEKLWIYEKDGIYNEAINFQEGLTVMLFVGVNGVGKTTSIGKL 352 
50 + +N K S+D++ +1 EKL E+ EK+G E GL+V+L VGVNGVGKTTSIGKL 

Sbjct: 80 VRRQNIKDSKD I QP 1 1 SEKIAELLEKEOTEITEVNLQPAGLSVILVVG VNGVGKTTS IGKL 139 

Query: 353 AHQYKSQGKKVMLVAADTFRAGAVAQLvEWGRRVDVPVVTGEEKADPASvVFDGMEKAVA 412 
AH YK QGKKV+L A DTFRAGA+ QL WG R V V+ E +DPA+V+FD ++ A + 
55 Sbjct: 140 AHMYKQCGKKVILAAGDTFRAGAIEQLEvWGERAGVDVIKQSEGSDPAAVMFDAIQAAKS 199 

Query: 413 QGVDVLLIDTAGRLQNKENLMAELEKIGRIIKRvVPDAPHETIiLALDASTGQNALSQAKE 472 
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+ D+L+ DTAGRLQNK NLM ELEK+ R+I R +P APHE L+ALDA+TGQNA+SQAK 
Sbjct: 200 READ I LI CDTAGRLQNKVNLMKELEKVKRVI SREI PGAPHEVLI ALDATTGQNAMSQAKT 259 

Query: 473 PSKITPLTGLILTKIDGTAKGGWIAIRQELDIPVKFIGFGEKIDDIGEFNSEDFMRGLL 532 

F + T +TG+ 1 LTK+DGTAKGG+VLAI R ELDIPVKF+G GEKIDD+ F+SE F+ GL 
Sbjct: 2S0 FKETTDVTGIILTKLDGTAKGGIVLAIRHELDIPVKFVGLGEKIDDLQPFDSEQFVYGLF 319 

Query: 533 EGIL 536 
+ ++ 

Sbjct: 320 KDMV 323 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1149> which encodes the amino acid 
sequence <SEQ ID 1 150>. Analysis of this protein sequence reveals the following: 

Possible site: 60 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4384 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 339/549 (61%) , Positives = 404/549 (72%) , Gaps = 46/549 (8%) 
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Sb j ct : 


52 
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92 


Query: 


121 
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178 
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Sb j ct : 


93 


-AADKTDLK--VSELSQSTASEPKDLVDQPWEQFPTKQAQADASNDSANEEAVDTSKEQ 


149 


Query: 


179 


SAADAFLADYYAKRKAIEKEISSNSLST DESEFSEAQEVLSQSQA- -DTIK 


227 






S++ + DYY ++ A+EK + + +T E++ S + E SQ++A DTI 




Sb j ct : 


150 


SSSQQVMEDYYRRKARLEKSLQEKAAATVPVMPEEVPQENQASTSAEA-SQNKATHDTIP 


208 


Query: 


228 


AESQEEKYNRSLKKTRTGFSARIjNAFLSNFRRVDEEFFEELEEMLILSDVGVNVATQLTE 


287 






E+ +EKY RSLKKTRTGFSARLN+F +NFRRVDEEFFE+LEEMLILSDVGV+VAT LTE 




Sb j ct : 


209 


-ETDQEKYKRSLKKTRTGFSARLNSFFANFRRVDEEFFEDLEEMLILSDVGVHVATTLTE 


267 


Query: 


288 


DLRYEAKLENAKKSEDLKRVIVEKLVEIYEKDGIYNEAINFQEGLTVMLFVGVNGVGKTT 


347 






+LRYEAKLENAKK + LKRVIVEKLV+ 1 YEKDG YNEAIN+Q+GLTVMLFVGVNGVGKTT 




Sbjct: 


268 


ELRYEAKLENAKKPDALKRVIVEKLVDIYEKBGRYNEAINYQDGLTVMLFVGVNGVGKTT 


327 


Query: 


348 


SIGKLAHQYKSQGKKVMLVAADTFRAGAVAQLWWGRRTOVPVVTGEEKADPASWFDGM 


407 






SIGKLA++YK +GKKVMLVAADTFRAGAVAQLVEWGRRVDVPV+TG EKADPASWFDGM 




Sbjct: 


328 


SIGKIAYRYKQEGKKVMLVAADTFRAGAVAQLVEWGRRVDVPVITGPEKADPASVVFDGM 


387 


Query: 


408 


EKAVAQGVDVLLIDTAGRLQNKENLMAELEKIGRIIKRWPDAPHETLLALDASTGQNAL 


467 






EKAVA+GVD+LLIDTAGRLQNKENLMAELEK+GRIIKRV+PDAPHETLLALDASTGQNAL 




Sb j ct : 


388 


EKAVAKGVDILLIDTAGRLQNKENLMAELEKMGRIIKRVLPDAPHETLLALDASTGQNAL 


447 


Query: 


468 


SQAKEFSKITPLTGLILTKIDGTAKGGvVLAIRQELDIPVKFIGFGEKIDDIGEFNSEDF 


527 






SQAKEFSKITPLTGLILTKIDGTAKGGWLAIRQELDIPVKFIGFGEK+DDIGEF+SEDF 




Sbj ct : 


448 


SQAKEFSKITPLTGLILTKIDGTAKGGVVLAIRQELD1PVKFIGFGEKVDDIGEFHSEDF 


507 


Query: 


528 


MRGLLEGIL 536 








M+GLLEGIL 




Sbj ct : 


508 


MKGLLEGIL 516 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 355 

A DNA sequence (GBSx0386) was identified in S.agalactiae <SEQ ID 1151> which encodes the amino 
5 acid sequence <SEQ ID 1 152>. Analysis of this protein sequence reveals the following: 

Possible site: 52 

>» Seems to have no N- terminal signal sequence 

Final Results 

10 bacterial cytoplasm Certainty=0. 3592 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Mot Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

15 >GP:AAA62048 GB:L10328 f270 [Escherichia coli] 

Identities = 101/273 (36%) , Positives = 160/273 (57%) , Gaps = 10/273 (3%) 

Query: 4 IKILALDLDGTLFTTDKKVSEENKVALKAAREKGIKVVITTGRPliKAIGNLLEDLELVSD 63 
IK++A+D+DGTL D +S K A+ AAR +G+ W+TTGRP + N L++L + 
20 Sbjct: 3 IKLIAIDMDGTLLLPDHTISPAVKNAIAAARARGVWVLTTGRPYAGVHNYLKELHMEQP 62 

Query: 64 EDYSITETIGGLVQQNT-GKIIAKTAMTRQEVEDIHEELYQVGLPTDILSEGTVYS 1 118 

DY IT+NG LVQ+ G +A+TA++ + + + +VG L T+Y+ I 

Sbjct: 63 GDYCITYNGALVQKAADGSTVAQTALSYDDYRXLEKLSREVGSHFHALDRTTLYTANRDI 122 

25 

Query: 119 ANKGHHSQYHIiANPLLEFIEVDDLEQVPKDVVYNKIVSVIDATYLDQQIAKLPDRLKVDY 178 

+ H + PL+ F E E++ + + K++ + + LDQ IA++P +K Y 
Sbjct: 123 SYYTVHESFVATIPLV-FCEA EKMDPNTQFLKVMMIDEPAIIiDQAIARIPQXVKEKY 178 

30 Query: 179 EMFKSRDIILELMPKGVHKAVGLELLTKHLGLDSSQVMAMGDFANDLSMLEWAGLGVAMA 238 

+ KS LE++ K V+K G++ L LG+ ++MA+GD+ ND++M+E+AG+GVAM 
Sbjct: 179 TVLKSAPYFLEILDKRWKGTGVKSLADVIX3IKPEEIMAIGDQENDIAMIEYAGVGVAMD 238 

Query: 239 NGIPEAKAIAKATTICNNDESGVAEAIGKYILS 271 
35 N IP K +A T +N E GVA AI KY+L+ 

Sbjct: 239 NAI PSVKEVANFVT - KSNLEDGVAFAIEKYVUJ 270 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1153> which encodes the amino acid 
sequence <SEQ ID 1154>. Analysis of this protein sequence reveals the following: 

40 Possible site: 32 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3502 (Affirmative) < suco 

45 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 180/273 (65%) , Positives = 218/273 (78%) , Gaps = 1/273 (0%) 



50 



Query: 3 DIKILALDLDGTLFTTDKKVSEENKVALKAAREKGIKVVITTGRPLKAIGNLLEDLELVS 62 

+I+ILALDLDGTL+ T+K V++ NK AL AAREKG+KWITTGRPLKAIGNLLE+L+L+ 
Sbjct: 2 NIRILALDLDGTLYNTEKIVTDANKKAIJWiREKGVKvVITTGRPLKAIGNLLEELDLLD 61 



55 Query: 63 DEDYSITFNGGLVQQNTGKIIAKTA^m^QEvEDIHEELYQvGLPTDILSEGTvYSIANK- 121 

+DYSITFNGGLVQ+NTG++L K++++ +V I + L VGLPTDI+S G VYSI +K 
Sbjct: 62 HDDYS ITFNGGLVQRNTCEVLDKSSLSFDQVCQIQQALEAVGIiPTDI I SGGD VYS I PSKD 121 



Query: 122 GHHSQYHIANPLLEFIEVDDLEQVPKDVVYNKIVSVIDATYLDQQIAKIjpDRLKVDYEMF 181 
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G HSQYHLANPLL FIEV + ++PKD+ YNKIV+V D +LDQQI KL L D+E F 
Sbjct: 122 GRHSQYHIjANPLLTFIEVTSVAELPKDITYKfKIVTVTDPDFLDQQIIKLSPSLFEDFEAF 181 

Query: 182 KSRDIILELMPKGVHKAVGLELLTKHLGLDSSQVMAMGDEMIDLSMLEWAGLGVAMANGI 241 
5 KBRDII E+MPKG+ KA GL LL +HLGLD+ VMAMGDEAND +MLEWAGLGVAMANG+ 

Sbjct: 182 KSRDIIFEIMPKGIDKAFGLNLLCQHLGLDARHVMRMGDEANDFAMLEWAGLGVAMANGV 241 

Query: 242 PEAKAIAKATTI CNNDESG VAEAIGKYILSEEN 274 
AKA A A T NDESGVAEA+ +IL EE+ 
10 ' Sbjct: 242 SGAKADADAVTTLTNDESGVAEAVKTFILEEES 274 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 356 

15 A DNA sequence (GBSx0387) was identified in S.agalactiae <SEQ ID 1155> which encodes the amino 
acid sequence <SEQ ID 1156>. Analysis of this protein sequence reveals the following: 

Possible site: 49 

>>> Seems to have no N-terminal signal sequence 

20 Final Results 

bacterial cytoplasm Certainty=0 . 4648 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

25 The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAA35556 GB:D90723 Hypothetical 30.2 kd protein in idh-deoR 
intergenic region. [Escherichia coli] 
Identities = 91/264 (34%) , Positives = 146/264 (54%) , Gaps = 4/264 (1%) 

30 Query: 2 IKLVATDMDGTFLDENGTYDKKRIiANVLKOTKEC^IvTTAASGRSLLSLEQLFADFRDQM 61 

IKL+A DMDGTFL + TY+++R ++ K QGI F ASG L F + +++ 

Sbjct: 4 IKLIAVD^IDGTFLSDQKTYNRERFMAQYQQMKAQGIRFVVASGNQYYQLISFFPEIANEI 63 

Query: 62 AFIAENGSAAVLFNRIiAYEQHLSREQYLDIIDHLSKSPYMENNEYVLSGKDGAYILSDAN 121 
35 AF+AENG V + + LS++ + +++HL P + E + GK+ AY L + 

Sbjct: 64 AFVAENGGWWSEGKDVFNGELSKDAFATWEHLLTRPEV EIIACGKNSAYTLKKYD 120 

Query: 122 PDYIEFITHYYDNLQKVSHFEDVDDI I FKVTANFTEETVRQAEEWVNQAI - PYATAVTTG 180 
YY L+ V +F++++DI FK N ++E + Q ++ +++AI +V TG 

40 Sbjct: 121 DAMKTVAEMYYHRLE YVDNFDNLED I FFKFGLNLSDEL I PQVQKALHEAIGD IMVS VHTG 180 

Query: 181 FKSIDIILSSVNKRNGLEHLCEQYGIRAEEVLSFGDNINDLEMLEWSGKAIATENARPEV 240 

SID+I+ V+K NGL L + +GI EV+ FGD ND+EML +G + A ENA V 
Sbjct: 181 NGSIDLIIPGVHKANGLRQLQKLWGIDDSEWVFGDGGNDIEMLRQAGFSFAMENAGSAV 240 



45 



Query: 241 KEIADCI IGHHNNQAVMAYLESMV 264 

A G +N + V+ ++ ++ 
Sbjct: 241 VAAAKYRAGSNNREGVLDVIDKVIi 264 

50 A related DNA sequence was identified in S.pyogenes <SEQ ID 1157> which encodes the amino acid 
sequence <SEQ ID 1 158>. Analysis of this protein sequence reveals the following: 

Possible site: 49 

»> Seems to have no N-terminal signal sequence 

55 Final Results 

bacterial cytoplasm Certainty=0. 3401 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0.0000 (Not Clear) < suco 



60 An alignment of the GAS and GBS proteins is shown below: 
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Identities = 138/265 (52%) , Positives = 193/265 (72%) , Gaps = 1/265 (0%) 

Query: 1 MIKLVATDMDGTFLDENGTYDKKRIANVLKKFKEC^IVFTAASGRSLLSLEQLFADFRDQ 60 

MIKL+ATDMDGTFL E+GTY++++LA +L K E+GI+F +SGRSLL+++QLF F DQ 
Sbjct: 1 MIKLIATDMDGTFLAEDGTYNQEQIoAALLPKLREKGILFAVSSGRSLLAIDQLFEPFLDQ 60 

Query: 61 mFIAENGSAAVLFNR^YEQHLSREQYLDIIDHLSKSPyMENNEYVLSGKDGAyiLSDA 120 

+A IAENGS + + +++EQY ++ + +P+ V SG+ AYIL A 

Sbjct: 61 IAVIAENGSWQYRGEILFADMMTKEQYTEVAKKIIiANPHYVETGMVFSGQKAAYILKGA 120 

Query: 121 NPDYIEFITHYYDNLQKVSHFEDVD-DIIFKVTANFTEETVRQAEEWVNQAIPYATAVTT 179 

+ +YI+ HYY N++ ++ FED++ D IFKV+ NFT TV + +W+NQA+ PYATAVTT 
Sbjct: 121 SEEYIQKTKHYYANVKVINGFEDMEOTDAIFKVSTNFTGHTVLEGSDWIJSIQALPYATAvTT 180 

15 Query: 180 GFKSIDIILSSVNKRNGLEHLCEQYGIRAEEVLSFGDNINDLEMLEWSGKAIATENARPE 239 

GF SIDIIL VNK G+EHLC+ GI+ E ++FGDN ND +MLE++G+AIATENARPE 
Sbjct: 181 GFDS ID I I LKE VNKGFGMEHLCQALGI KKAETIAFGDNFNDYQMLEFAGRAI ATENARPE 240 

Query: 240 VKEIADCI IGHHNNQAVMAYLESMV 264 
20 +K I+D +IGH N+ AV+ YL+ +V 

Sbjct: 241 I KVI SDQVIGHCNDGAVLTYLKGLV 265 

Based on this analysis, it was predicted that these proteins and their epitopes could he useful antigens for 
vaccines or diagnostics. 

25 Example 357 

A DNA sequence (GBSx0388) was identified in S.agalactiae <SEQ ID 1159> which encodes the amino 
acid sequence <SEQ ID 1 160>. Analysis of this protein sequence reveals the following: 



30 



35 



Possible site: 18 

>>> Seems to have no N- terminal signal secjuence 



Final Results 

bacterial cytoplasm Certainty=0 . 2428 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

40 Example 358 

A DNA sequence (GBSx0389) was identified in S.agalactiae <SEQ ID 1161> which encodes the amino 
acid sequence <SEQ ID 1162>. This protein is predicted to be pi 15 protein (smc). Analysis of this protein 
sequence reveals the following: 

Possible site: 55 
45 >» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -4.99 Transmembrane 1092 -1108 (1088 -1110) 

Final Results 

bacterial membrane Certainty=0. 2996 (Affirmative) < suco 

50 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



A related GBS nucleic acid sequence <SEQ ID 9713> which encodes amino acid sequence <SEQ ID 9714> 
was also identified. 
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The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB13467 GB:Z99112 chromosome segregation SMC protein homolg 
[Bacillus subtilis] 

Identities = 458/1193 (38%) , Positives = 728/1193 (60%) , Gaps = 27/1193 (2%) 

Query: 1 MFLKEIEMQGFKSFADKTKVEFDQGVTAWGPNGSGKSNITESLRWALGESSAKSLRGGK 60 

MFLK +++ GFKSFA++ V+F +GVTAWGPNGSGKSNIT+++RW LGE SA+SLRGGK 
Sbjct: 1 MFLKRLDVIGFKSFAERISVDFVKGVTAWGPNGSGKSNITDAIRWVLGEQSARSLRGGK 60 

Query: 61 MPDVIFAGTENRKPLNYAQVSVTLDNSDHFIENIADEVRVERRIFRNGDSEYLIDGRKVR 120 

M D+IFAG+++RK LN A+V++TLDN DHF+ EV V RR++R+G+SE+LI+ + R 

Sbjct: 61 MEDIIFAGSDSRKRLNIAEVTLTLDNDDHFLPIDFHEVSVTRRVYRSGESEFLINNQPCR 120 

Query: 121 LRDIHDLFMDTGLGRDS FS 1 1 SQGRVEAI FNSKPEERRAI FEEAAGVLKYKTRKKETQSK 180 

L+DI DLFMD+GLG+++FSIISQG+VE I +SK E+RR+ I FEEAAGVLKYKTRKK+ ++K 
Sbjct: 121 LKDI IDLFMDSGLGKEAFS I I SQGKVEE I LS S KAEDRRS I FEEAAGVLKY KTRKKKAENK 180 

Query: 181 LEQTQGNLDRLEDIIYELDMQVQPLEKQASIAKRFLVLDEERQGLHLSILIEDILQHQSD 240 

L +TQ NL+R+EDI++EL+ QV+PL+ QASIAK +L +E + + +++ DI + 
Sbjct: 181 LFETQDNLNRVEDILHELEGQVEPLKIQASIAKDYLEKKKELEHVEIALTAYDIEKLHGK 240 

Query: 241 LTTVEEKLLTVRKELATYYQQRQSLEDENQSLKQKRHHLSEEIEAKQ1ALLDVTKLKSDL 300 
+T++EK+ ++E +E + ++ K LE+ QLL++ L 

• Sbjct: 241 WSTLKEKVQMAKEEEIAESSAISAKEAKIEDTRDK1QALDESVNELQQVLLVTSEELEKL 300 

Query: 301 ERQIDLIRLESNQKAEKKEEAGQRIAELEIKAKDCSDQITQKNIELTTLSEKIAQIRSEI 360 

E + ++++ + +E+ + + + + K ++++++ TL ++ Q+R+++ 

Sbjct: 301 EGRKEVLKERKKNAVQNQEQLEEAIVQFQQKETVLKEELSKQEAVFETLQAEVKQLRAQV 360 

Query: 361 VSTESSLERFSTNPDQIIEKLREDFVTLMQEEADTSNMiTALLADIENQKQASQAKSQEI 420 

+ +L + N ++ IE+L+ D+ L+ +A N L LL D +Q + + + 
Sbjct: 361 KEKQQALSLHNENVEEKIEQLKSDYFELLNSQASIRNEL-QLLDDQMSQSAVTLQRLADN 419 

Query: 421 QEVSKTOEVLKSNAKVALE-RFEAAKKNVRQLLSHYQDLGQTLQ^EGEYKNQQSILFDH 479 

E SKAEF +++ + Y+D+ , + + +Y+ +S L+ 

Sbjct: 420 NEKHLQERHDISARKAACETEFARIEQEIHSQVGAYRDMQTKYEQKKRQYEKNESALYQA 479 

Query: 480 LDEIKSKQARISSLESILKNHSNFYAGVKSVLQAKDQLGGIIGAVSEHLSFDKHYQTALE 539 

++ +++ LE++ + S FY GVK VL+AK++LGGI GAV E +S ++ Y+TA+E 
Sbjct: 480 YQYVQQARSKKDMLETMQGDFSGFYQGVKEVLKAKERLGGIRGAVLELISTEQKYETAIE 539 

Query: 540 IALGGSSQHIIVEDESAAKRSIAFLKKNRQGRATFLPLTTIKPRELAQHYLSKLQSSQGF 599 

IALG S+QH++ +DE +A+++I +LK+N GRATFLPL+ 1+ R+L F 
Sbjct: 540 1ALGASAQHVVTDDEQSARKAIQYLKQNSFGRATFLPLSVIRDRQLQSRDAETAARHSSF 599 

Query: 600 LGIASELWYDQRLSNIFKNNLGLTAIFDTVDNAWAARQLOTQVRLVTLDGTELRPGGS 659 

LG+ASELVT+D ++ +N LG I + + AN A+ L ++ R+VTL+G + PGGS 
Sbjct: 600 LGVASELVTFDPAYRSVIQNLLGTVLITEDLKGANELAKLLGHRYRIVTLEGDVVNPGGS 659 

Query: 660 YSGGANRQNNTVFI--KPELDNLKKELKQAQSKQLIQEKEVATLLEQLKEKQETLAQLKN 717 

+GGA ++ N + EL+++ K L + + K + E+EV TL +++ ++ IA L+ 
Sbjct: 660 MTGGAVKKKNNSLLGRSRELEDVTKRI^MEEKTALLEQEVKTLKHSIQDMEKKIADLRE 719 

Query: 718 DGEQARLEEQRADIEYQQLSEKLADLNKLYNGLQLSSGALEQTTSENE--KNRLEKELEQ 775 

GE RL++Q + +L ++N AL ++ E + K +LE+EL 

Sbjct: 720 TGEGLRLKQQDVKGQLYELQVAEKNINTHLELYDQEKSALSESDEERKVRKRKLEEELSA 779 

Query: 776 FAIKKEELTTSIAQIKEDKDSIQEK\7NKLTTLLSEaQLEERDL]^QKFERANCTRL 832 

+ K ++L I ++ + K + +L+ L+E ++ K E N RL 

Sbjct: 780 VSEKMKQLEEDIDRLTKQKQTQSSTKESLSNELTELKIAAAKKEQACKGEEDNLARLKKE 839 

Query: 833 EITLSEIKRDISNLQTLLSHQDSQLDKEELPRIEKQLLQVNNRRENDEEKLVSLRF 888 

E+ L E K D+S L + +S S E++L + + ND+ K + L 

Sbjct: 840 LTETELALKEAKEDLSFLTSEMSSSTSG EEKLEE?AKHKLNDKTKTIELIA 890 

Query: 889 ELEDCEAALDDLAASLAKEGQKNESLIRQQAQL ESQCEQLSQQLMI FSRQLSEDYQ 944 

D L + +E ++ + L +Q+ L E + ++ +L + L E+Y 
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Sbjct: 891 LRRDQRIKLQHGLDTYERELKEMKRLYKQKTTLLKDEEVKLGRMEVELDNLLQYLREEYS 950 

Query: 945 MTLDEAKVKANVLEDILMAREQLKSLQAKIKMiGPVNIDAIAQFEEVHERLTFLNTQRDD 1004 

++ + AK K + D AR+++K ++ 1+ LG VN+ +1 +FE V+ER FL+ Q++D 
Sbjct: 951 LSFEGAKEKYQLETDPEEARKRVKLIKLAIEEIK3TVNLGSIDEFERVNERYKFLSEQKED 1010 

Query: 1005 LVHAKNLLLETITDMDDEVKTRFKSTFEAIRHSFKETWQMFGGGSADLILTE-GDLLSA 1063 

L AKN L + I +MD+E+ RF TF IR F + F +FGGG A+L LT+ DLL + 
Sbjct: 1011 LTEAKNTLFQVIEEMDEEMTKRFNDTFVQIRSHFDQVFRSLFGGGRAELRLTDPNDLLHS 1070 

Query: 1064 GVDISVQPPGKKIQSLNLMSGGEKALSALALLFAIIRVKTIPFVILDEVEAALDEANVKR 1123 

GV+I QPPGKK+Q+LNL+SGGE+AL+A+ALLF+I++V+ +PF +LDEVEAALDEANV R 
Sbjct: 1071 GWIIAQPPGKKLQNLNLLSGGERALTAIALLFSILKVRPVPFCVLDEVEAALDEANVFR 1130 

15 Query: 1124 FGDYLNRFDKSSQFIWTHRKGTMSAADSIYGVTMQESGVSKIVSVKLKEAQE 1176 

F YL ++ +QFIV+THRKGTM AD +YGVTMQESGVSK++SVKL+E +E 
Sbjct: 1131 FAQYLKKYSSDTQFIVITHRKGTMEEADVLYGVTMQESGVSKVISVKLEETKE 1183 

A related DNA sequence was identified in S. pyogenes <SEQ ID 1163> which encodes the amino acid 
20 sequence <SEQ ID 1 164>. Analysis of this protein sequence reveals the following: 

Possible site: 15 



25 



30 



35 



>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -4.99 Transmembrane 1092 -1108 (1088 -1110) 



Final Results 

bacterial membrane Certainty=0 . 2996 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:CAB13467 GB:Z99112 chromosome segregation SMC protein homolg 
[Bacillus subtilis] 

Identities = 441/1192 (36%) , Positives = 729/1192 (60%) , Gaps = 25/1192 (2%) 

Query: 1 MFLKEIELEGFKSFADKTKIEFDKGVTAWGPNGSGKSNITESLRWALGESSAKNLRGGK 60 

MFLK +++ GFKSFA++ ++F KGVTAWGPNGSGKSNIT+++RW LGE SA++LRGGK 
Sbjct: 1 MFLKRLDVIGFKSFAERISVDFVKGVTAWGPNGSGKSNITDAIRWVLGEQSARSLRGGK 60 

40 Query: 61 MPDVIFAGTQNRNPLNYAKVAVVLDNSDHFIKTAKKEIRVERHIYRNGDSDYLIDGRKVR 120 

M D+IFAG+ +R LN A+V + LDN DHF+ E+ V R +YR+G+S++LI+ + R 

Sbjct: 61 MED1IFAGSDSRKRIJTOAEVTLTLDNDDHFLPIDFHEVSVTRRVYRSGESEFLINNQPCR 120 

Query: 121 LRDIHDLFMDTGLGRDSFSIISQGRVEEIFNSKPEERRAIFEEAAGVLKYKTRKKETQIK 180 
45 L+Dl DLFMD+GLG+++FSIISQG+VEEI +SK E+RR+ I FEEAAGVLKYKTRKK+ + K ' 

Sbjct: 121 LKD I IDLFMDSGLGKEAFS I I SQGKVEEILSSKAEDRRS I FEEAAGVLKYKTRKKKAENK 180 

Query: 181 LNQTQDNLDRLEDIIYELDTQLAPLEKQAKVAKQFLELDANRKQLQLDILVKDIDIAQER 240 
L +TQDNL+R+EDI++EL+ Q+ PL+ OA +AK +LE + +++ + DI+ + 

50 Sbjct: 181 LFETQDNLNRVEDILHELEGQVEPLKIQASIAKDYLEKKKELEHVEIALTAYDIEKLHGK 240 

Query: 241 QTKDTEALAALQQDIASYYAKRQSMEEDYQKFKQKKQVLSQESDQTQTTLLELTKLIADL 300 

+ E + +++ ++ +E ++KQL+++QLL+++L 
Sbjct: 241 WSTLKEKVQMAKEEELAESSAISAKEAKIEDTRDKIQALDESVNELQQVLLVTSEELEKL 300 

55 

Query: 301 EKQIELVKLESGQEAEKKAEAKKHLEQLQEQLDGFQAEEKQCTEQLLH IDQQL 353 

E + E++K E+K A ++ EQL+E + FQ +E E+L + ++ 

Sbjct: 301 EGRKEVLK ERKKNAVQNQEQLEEAIVQFQQKETVLKEELSKQEAVFETLQAEV 353 

60 Query: 354 CDVKQQLNELSNALERFSSDPDQLMETLREEFVLLMQKEAALSNQLTALKAHLDKEKQAR 413 

++ Q+ E AL + + ++ +E L+ ++ L+ +A++ N+L L + + 
Sbjct: 354 KQLRAQVTffiKQQALSLHNENVEEKIEQLKSDYFELIMSQASIRNELQLLDDQMSQSAVTL 413 

Query: 414 QHKAQEYQLLVTKLDQLNDESQKAQAHYKAQKEQVEMLLQNYQEGDKRVQELERDYQLNQ 473 
65 QA+++++ + + ++++ + Y++ + ++ +R Y+ N+ 
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Sbjct: 414 QRLADNNEKHLQERHD I SARKAACETEFARIEQEIHSQ VGAYRDMQTKYEQKKRQYEKNE 473 

Query: 474 ERLFDLLDQKKGPCEARKASLESIQKSHSQFYAGVRAVLQSQKKLGGIIGAVSEHLSFDSD 533 

L+ + ++K LE++Q S FY GV+ VL+++++LGGI GAV E +S + 

Sbjct: 474 SALYQAYQYVQQARSKKDMLETMQGDFSGFYQGVKEVLKAKERLGGIRGAVLELISTEQK 533 

Query: 534 YQTALEVALGANSQHIIOTDEAAAKRAIAYLKKNRQGRATFLPLTTIKARSLSEHYHRQL 593 

Y+TA+E+ALGA++QH++ DE +A++AI YLK+N GRATFLPL+ 1+ R Ii 
Sbjct: 534 YETAIEIALGASAQHVvTDDEQSARKAIQYLKQNSFGRATFLPLSVIRDRQLQSRDAETA 593 

Query: 594 ATCEGYLGTAESLIRYDDSLSAIIQNLLSSTAIFETIDQANIAARLLGYKWIVTLDGTE 653 

A +LG A L+ +D + ++IQNLL + I E + AN A+LLG++ RIVTL+G 
Sbjct: 594 ARHSSFLGVASELVTFDPAYRSVIQNLLGTVLITEDLKGANELAKLLGHRYRIVTLEGDV 653 

Query: 654 LRPGGSFSGGANRQSNTTFI--KPELEQISEELTRLVEQLKITEKEVAALQSDLIAKKEE 711 

+ PGGS +GGA ++ N + + ELE +++ L + E+ + E+EV L+ + +++ 
Sbjct: 654 WPGGSMTGGAVKKKfflSISLLGRSRELEDVTKRIAEMEEKTALLEQEVKTLKHSIQDMEKK 713 

Query: 712 LTQLK1AGDQARIAEQ--RAQMAYQQLQEKQEDSKAL1AALDQSQTTHSDESLLAEQARI 769 

L L+ G+ RL +Q + Q+ Q+ EK ++ L ++S + SDE + ++ 

Sbjct: 714 LADLRETGEGLRLKQQDVKGQLYELQVAEKNINTHLELYDQEKSALSESDEERKVRKRKL 773 

Query: 770 EEALTAIAKKKNALTCDIDDIKENKDLIRQKTQNIHQALSQARLQERDLLNEKKFEQANQ 829 

EE L+A+++K L DID + + K +++ L++ ++ K E+ N 

Sbjct: 774 EEELSAVSEKMKQLEEDIDRLTKQKQTQSSTKESLSNELTELKIAAAKKEQACKGEEDNL 833 

Query: 830 SRLRTQLKQCQQNILKLESILNNNVSQDSIQRLPQWQKQLQDATEHKSGAQKRLVQLRFE 889 

+RL+ +L + + + + + L+ S+ S +++L++A +HK + + ++L 

Sbjct: 834 ARI1KKELTETEIALKEAKEDLSFLTSEMSSS--TSGEEKLEEAAKHKLNDKTKTIELIAL 891 

Query: 890 IEDYEARLEETAEKITKESEKNDTFIRRQTKL ETHLEQVANRLRAYAKSLSEDFQM 945 

D +L+ + +E ++ +++T Ii E L ++ L + L E++ + 

Sbjct: 892 RRDQRIKLQHGLDTYERELKEMKRLYKQKTTLLKDEEVKLGRMEVELDNLLQYLREEYSL 951 

Query: 946 TLADAKEVTNSIDHLESAKEKLHHLQKTIRALGPINSDAINQYEEVHERLTFLTSQKTDL 1005 

+ AKE E A++++ ++ I LG +N +I+++E V+ER FL+ QK DL 

Sbjct: 952 SFEGAKEKYQLETDPEEARKRVKLIKLAIEELGTVNLGSIDEFERVNERYKFLSEQKEDL 1011 

Query: 1006 TKAKNLLLETINSMDSEVKARFKVTFEAI QKS FKET FTQMFGGGSADLVLTE - TDLLSAG 1064 

T+AKN L + I MD E+ RF TF 1+ F + F +FGGG A+L LT+ DLL +G 
Sbjct: 1012 TEAKNTLFQVIEEMDEEMTKRFNDTFVQIRSHFDQVFRSLFGGGRAELRLTDPNDLLHSG 1071 

Query: 1065 IEISVQPPGKKIQSIjNLMSGGEKALSAIALLFAIIRVKTIPFVILDEVEAALDEANVKRF 1124 

+EI QPPGKK+Q+LNL+SGGE+AL+A+ALLF+I++V+ +PF +LDEVEAALDEANV RF 
Sbjct: 1072 VEIIAQPPGKKLQNLNLLSGGERALTAIALLFSILKVRPVPFCVLDEVEAALDEANVFRF 1131 

Query: 1125 GDFLNRFDKDSQFIVVTHRKGTMAAADSIYGITMQESGVSKIVSVKLKEAQE 1176 

+L ++ D+QFIV+THRKGTM AD +YG+TMQESGVSK++SVKL+E +E 
Sbjct: 1132 AQYLKKYSSDTQFIVITHRKGTMEEADVLYGVTMQESGVSKVISVKLEETKE 1183 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 732/1179 (62%) , Positives = 911/1179 (77%) 

Query: 1 MFLKEIEMQGFKSFADKTKVEFDQGVTAWGPNGSGKSNITESLRWALGESSAKSLRGGK 60 

MFLKEIE++GFKSFADKTK+EFD+GVTAVVGPNGSGKSNITESLRWALGESSAK+LRGGK 
Sbjct: 1 MFLKEIELEGFKSFADKTKIEFDKGvTAWGPNGSGKSNITESLRWALGESSAKNLRGGK 60 

Query: 61 MPDVIFAGTEJSIRKPIiOTAQVSvTLDNSDHFIENIADEVRVERRIFRNGDSEYLIDGRKVR 120 

MPDVI FAGT+NR PLNYA+V+V LDNSDHFI+ E+RVER I+RNGDS+YLIDGRKVR 
Sbjct: 61 MPDVIFAGTQNRNPLNYAKVAVVLDNSDHFIKTAKKEIRVERHIYl^GDSDYLIDGRKVR 120 

Query: 121 LRD IHDLFMDTGLGRDS FS 1 1 SQGRVEAI FNSKPEERRAIFEEAAG VLKYKTRKKETQSK 180 

LRDIHDLFMDTGLGRDSFSIISQGRVE IFNSKPEERRAIFEEAAGVLKYKTRKKETQ K 
Sbjct: 121 LRDIHDLFMDTGLGRDSFSIISQGRVEEIFNSKPEERRAIFEEAAGvLKYKTRKKETQIK 180 



Query: 181 



LEQTQGNLDRLEDI IYELDMQVQPLEKQASIAKRFLVLDEERQGLHLS ILIEDILQHQSD 240 
L QTQ NLDRLEDI IYELD Q+ PLEKQA +AK+FL LD R+ L L IL++DI Q 
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Sbjct: 181 MQTQDNLDRLEDIIYELDTQMPLEKCjaKVAKQFLELDANRKQLQLDILVKDIDIAQER 240 

Query: 241 LTTVEEKLLTWKEIATYYQQRQSLEDENQSLKQKRHHLSEEIEAKQLALLDVTKLKSDL 300 

T EL ++++LA+YY +RQS+E++ Q KQK+ LS+E + Q LL++TKL +DL 
Sbjct: 241 QTKDTEALAALQQDLASYYAKRQSMEEDYQKFKQKKQVLSQESDQTQTTLLELTKLIADL 300 

Query: 301 ERQIDLIRLESNQKAEKKEEAGQRLAELEIKAKDCSDQITQKNIELTTLSEKIAQIRSEI 360 

E+QI+L++LES Q+AEKK EA + L +L+ + + Q +L + +++ ++ ++ 

Sbjct: 301 EKQIELVKLESGQEAEKKAEAKKHLEQLQEQLDGFQAEEKQCTEQLLHIDQQLCDVKQQL 360 

Query: 361 VSTESSLERFSTNPDQIIEKLREDFVTLMQEEADTSNALTALIADIENQKQASQAKSQEI 420 

++LERFS++PDQ++E LRE+FV LMQ+EA SN LTAL A ++ +KQA Q K+QE 
Sbjct: 361 NELSNALERFSSDPDQLMETLREEFVLLMQKEAALSNQLTALKAHLDKEKQARQHKAQEY 420 

Query: 421 QEVSKNLEVLKSNAKVALERFEAAKKNVRQLLSHYQDLGQTLQNLEGEYKNQQSILFDHL 480 

Q + L+ L ++ A ++A K+ V LL +YQ+ + +Q LE +Y+ Q LFD L 
Sbjct: 421 QLLVTKLDQLNDESQKAQAHYKAQKEQVEMLLQNYQEGDKRVQELERDYQLNQERLFDLL 480 

Query: 481 DEIKSKQARISSLESILKNHSNFYAGVKSVLQAKDQLGGIIGAVSEHLSFDKHYQTALEI 540 

D+ K K+AR +SLESI K+HS FYAGV++VLQ++ +LGGI IGAVSEHLSFD YQTALE+ 
Sbjct: 481 DQKKGKEARKASLESIQKSHSQFYAGVRAVLQSQKKLGGIIGAVSEHLSFDSDYQTALEV 540 

Query: 541 ALGGSSQHIIVEDESAAKRSIAFLKKNRQGRATFLPLTTIKPRELAQHYLSKLQSSQGFL 600 

ALG +SQHIIV DE+AAKR+ IA+LKKNRQGRATFLPLTTIK R L++HY +L + +G+L 
Sbjct: 541 ALGANSQHIIVTDFAAAKRAIAYLKKNRQGRATFLPLTTIKARSLSEHYHRQLATCEGYL 600 

Query: 601 GIASELVTYDQRLSNIFKNNLGLTAIFDTVDNANVAARQLNYQVRLVTLDGTELRPGGSY 660 

G A L+ YD LS I +N L TAIF+T+D AN+AAR L Y+VR+VTLDGTELRPGGS+ 
Sbjct: 601 GTAESLIRYDDSLSAIIQNLLSSTAIFETIDQANIAARLLGYKVRIVTLDGTELRPGGSF 660 

Query: 661 SGGANRQNOTVFIKPELDNLKKELKQAQSKQLIQEKEVATLLEQLKEKQETIAQLKNDGE 720 

SGGANRQ+NT FIKPEL+ + +EL + + I EKEVA L L K+E L QLK G+ 
Sbjct: 661 SGGANRQSNTTFIKPELEQISEELTRLVEQLKITEKEVAALQSDLIAKKEELTQLKIAGD 720 

Query: 721 QARIjEEQRADIEYQQLSEKLADLNKLYNGLQLSSGALEQTTSENEKNRLEKELEQFAIKK 780 

QARL EQRA + YQQL EK D L L S + E+ R+E+ L A KK 

Sbjct: 721 QARIJVEQRAQMAYQQLQEKQEDSKALLAALDQSQTTHSDESLLAEQARIEEALTAIAKKK 780 

Query: 781 EELTTSIAQIKEDKDSIQEKVWNLTTLLSEAQLEERDLLNEQKFERANCTRLEITLSEIK 840 

LT I IKE+KD I++K N+ LS+A+L+ERDLLNE+KFE+AN +RL L + + 
Sbjct: 781 NALTCDIDDIKENKDLIRQKTQNIHQALSQARLQERDLLNEKKFEQANQSRLRTQLKQCQ 840 

Query: 841 RDISNLQTLLSHQDSQLDKEELPRIEKQLLQVNNRRENDEEKLVSLRFELEDCEAALDDL 900 

++I L+++L++ SQ + LP+ +KQL + +++LV LRFE+ED EA L++ 

Sbjct: 841 QNILKLESIIJWJ1WSQDSIQRLPQWQKQLQDATEHKSGAQKRLVQLRFEIEDYEARLEET 900 

Query: 901 AASIAKEGQKNESLIRQQAQLESQCEQLSQQLMIFSRQLSEDYQMTLDEATOnCANVLEDI 960 

A + KE +KN++ IR+Q +LE+ EQ++ +L +++ LSED+QMTL +AK N ++ + 
Sbjct: 901 AEKITKESEKNDTFIRRQTKLETHLEQVANRIiRAYAKSLSEDFQMTLADAKEVTNSIDHL 960 

Query: 961 LMAREQLKSLQAKIKALGPWIDAIAQFEEWERLTFLNTQRDDLVHAKNLLLETITDMD 1020 

A+E+L LQ I+ALGP+N DAI Q+EEVHERLTFL +Q+ DL AKNLLLETI MD 
Sbjct: 961 ESAKEKLHHLQKTIRALGPINSDAINQYEEVHERLTFLTSQKTDLTKAKNLLLET1NSMD 1020 

Query: 1021 DEVKTRFKSTFEAIRHSFKETFVQMFGGGSADLILTEGDLLSAGVDISVQPPGKKIQSLN 1080 

EVK RFK TFEAI+ SFKETF QMFGGGSADL+LTE DLLSAG+ + 1 SVQPPGKKIQSLN 
Sbjct: 1021 SEVKARFKVTFEAIQKSFKETFTQMFGGGSADLVLTETDI.LSAGIEISVQPPGKKIQSLN 1080 

Query: 1081 LMSGGEKALSALALLFAI IRVKTI PFVILDEVEAALDEANVKRFGDYLNRFDKSSQFIW 1140 

LMSGGEKALSALALLFAI IRVKTI PFVILDEVEAALDEANVKRFGD+LNRFDK SQFIW 
Sbjct: 1081 LMSGGEKALSALALLFAI IRVKTIPFVILDEVEAAIiDEANVKRFGDFIMlFDKDSQFIVV 1140 

Query: 1141 THRKGTMSAADSIYGVTMQESGVSKIVSVKLKEAQEMTN 1179 

THRKGTM+AADSIYG+TMQESGVSKIVSVKLKEAQEMTN 
Sbjct: 1141 THRKGTMAAADSIYGITMQESGVSKIVSVKLKEAQEMTN 1179 
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SEQ ID 1 162 (GBS199) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 52 (lane 2; MW 75kDa). 

GBS199-GST was purified as shown in Figure 208, lane 3. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
5 vaccines or diagnostics. 

Example 359 

A DNA sequence (GBSx0390) was identified in S.agalactiae <SEQ ID 1165> which encodes the amino 
acid sequence <SEQ ID 1166>. This protein is predicted to be ribonuclease III (rnc). Analysis of this 
protein sequence reveals the following: 

10 Possible site: 46 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3372 (Affirmative) < suco 

15 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 971 1> which encodes amino acid sequence <SEQ ID 9712> 
was also identified. 

20 The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB13466 GB: 299112 ribonuclease III [Bacillus subtilis] 
Identities = 115/230 (50%) , Positives = 154/230 (66%) , Gaps = 1/230 (0%) 

Query: 13 KKMKEIjRSKLEKDYGIVFANQELLDTAFTHTSYANEHRIJIjNISHNERLEFLGDAVLQLLI 72 
25 KK+++ + E+ + F N++LL AFTH+SY NEHR NERLEFLGDAVL+L I 

SbjCt: 15 KKVEQFKEFQER-ISVHFQNEKLLYQAFTHSSYVNEHRKKPYEDNERLEFLGDAVLELTI 73 

Query: 73 SQYLFTKYPQKAEGDLSKLRSMIWEESIAGFSRLCGFDHYIKLGKGEEKSGGRNRDTIL 132 
S++LF KYP +EGDL+KLR+ IV E SL + F + LGKGEE +GGR R +L 
30 Sbjct: 74 SRFLFAKYPAMSEGDLTKLRAAIVCEPSLVSLAHELSFGDLVLLGKGEEMTGGRKRPALL 133 

Query: 133 GDLFFAFLGALLLDKGVEVVHAFVNKVMIPHVEKGTYERVKDYKTSLQELLQSHGDVKID 192 

D+FEAF+GAL LD+G+E V +F+ + P + G + V D+K+ LQE +Q G ++ 
Sbjct: 134 ADVFFAFIGALYLDQGLEPVESFLKVYVFPKINDGAFSHVMDFKSQLQEYVQRDGKGSLE 193 



35 



Query: 193 YQVTNESGPAHAKEFEVTVSVNQENLSQGIGRSKKAAEQDAAKNALATLQ 242 

Y+++NE GPAH +EFE VS+ EL G GRSKK AEQ AA+ ALA LQ 
Sbjct: 194 YKISNEKGPAHNREFEAIVSLKGEPLGVGNGRSKKEAEQHAAQEALAKLQ 243 



40 A related DNA sequence was identified in S.pyogenes <SEQ ID 1167> which encodes the amino acid 
sequence <SEQ ID 1 168>. Analysis of this protein sequence reveals the following: 

Possible site: 32 

»> Seems to have no N-terminal signal sequence 

45 Final Results 

bacterial cytoplasm Certainty=0. 1414 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 .0000 (Not Clear) < suco 

50 An alignment of the GAS and GBS proteins is shown below: 

Identities = 170/227 (74%) , Positives = 192/227 (83%) 
Query: 15 MKELRSKLEKDYGIVFANQELLDTAFTHTSYANEHRLLNISHNERLEFLGDAVLQLLISQ 74 
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MK+L L + I F + LL+TAFTHTSYANEHRLIJSr+SHNERLEFLGDAVLQL+IS+ 
Sbjct: 1 MKQLEELLSTSFDIQFNDLTLLETAFTHTSYANEHRLriNVSHNERLEFLGDAVLQLIISE 60 

Query: 75 YLFTKYPQKAEGDLSKLRSMIWEESLAGFSRLCGFDHYIKLGKGEEKSGGRNRDTILGD 134 

YLF KYP+K EGD+SKLRSMIVREESLAGFSR C FD YIKLGKGEEKSGGR RDTILGD 
Sbjct: 61 YLFAKYPKKTEGDMSKLRSMIVREESLAGFSRFCSFDAYIKLGKGEEKSGGRRRDTILGD 120 

Query: 135 LFFAFLG^LLDKGVEVVHAFWKVMIPHVEKGTYERVKDYKTSLQELLQSHGDVKIDYQ 194 

LFEAFLGALLLDKG++ V F+ +VMIP VERG +ERVKDYKT LQE LQ+ GDV. IDYQ 
Sbjct: 121 LFEAFLGAIiLLDKGlDAVRRFLKQVMIPQVEKGNFERVKDYKTCLQEFLQTKGDVAIDYQ 180 

Query: 195 VITOSGPAHAKEFEVTVSVNQENLSQGIGRSKKAAEQDAAKNALATL 241 

V +E GPAHAK+ FEV+ + VN LS+G+G+SKK AEQDAAKNALA L 
Sbjct: 181 VISEKGPAHAKQFEVSIWNGAVLSKGLGKSKKLAEQDAAKNALAQL 227 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 360 

A DNA sequence (GBSx0391) was identified in S.agalactiae <SEQ ID 1169> which encodes the amino 
acid sequence <SEQ ID 1170>. Analysis of this protein sequence reveals the following: 

Possible site: 43 

»> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -4.19 Transmembrane 100 - 116 ( 99 - 117) 
INTEGRAL Likelihood = -2.44 Transmembrane 81 - 97 ( 81 - 97) 

'■ Final Results 

bacterial membrane — Certainty=0. 2678 (Affirmative) < succ> 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAC12789 GB:AJ279090 hypothetical protein [Staphylococcus 
carnosus] 

Identities = 50/114 (43%) , Positives = 72/114 (62%) 

Query: 3 KIFYISLGFISLGIGIAGI VLPWPTTPLvLLSAFCFSRSSEKFDIWLRQTKVYKYYAAD 62 

K ++LG I GIG GIV+P++PTTP +LL+A CFSRSS+KF+ WL TK++ Y 
Sbjct: 2 KYVLMTLGLIFAGIGWGIWPLLPTTPFLLLAAICFSRSSKKFNRWLVNTKIHDEYVES 61 

Query: 63 FVESRSIAPARKKSMIWQIYILMGISIYFAPLMWLKLGLLIGTIVGTYVLFYW 116 

F + +K ++ +YILMGISI+ +++++ LLI V T VLF V 

Sbjct: 62 FKRDKGFTLKKKFKLLTSLYILMGIS I FI IDNLYIRITLLIMLFVQTWLFTFV 115 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 361 

A DNA sequence (GBSx0392) was identified in S.agalactiae <SEQ ID 1171> which encodes the amino 
acid sequence <SEQ ID 1 172>. Analysis of this protein sequence reveals the following: 

Possible site: 45 

»> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 1908 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 1173> which encodes the amino acid 
sequence <SEQ ID 1 174>. Analysis of this protein sequence reveals the following: 

Possible site: 45 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1610 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 225/269 (83%) , Positives = 248/269 (91%) 



Query: 


1 


MSEIGFKYSILASGSTGNCFYIETPQKRLLIDAGLTGKiCVTSLLAEINRKPEDLDAILVT 


60 






M+E GFKYS ILASGSTGNCFY+ETP+KRLLIDAGLTGKK+TSLLAEI +RKPEDLDAIL+T 




Sb j ct : 


1 


MNESGFKYSILASGSTGNCFYLETPKKRLLIDAGLTGKKITSLLAEIDRKPEDLDAILIT 


60 


Query: 


61 


HEHSDHIKGVGVLARK^HLDIYANEQTWKWCIERNMI^KVDVSQKHVFGRGKTLTFGDLD 


120 






HEHSDHIKGVGV+ARKYHLDIYANE+TW++MDE NMLGK+D SQKH+F R K LTFGD+D 




Sb j ct : 


61 


HEHSDHIKGVGVMARKYHLDIYANEKTWQLMDECMLGKLDASQKHIFQRDKVLTFGDVD 


120 


Query: 


121 


IESFGVSHDAVDPQFYRMMKDDKSFvMLTDTGYVSDRMAGLIENADGYLIESNHDIEILR 


180 






IESFGVSHDA+DPQFYR+MKD+KSFVMLTDTGYVSDRM G+IENADGYLIESNHDIEILR 




Sb j ct : 


121 


IESFGVSHDAIDPQFYRIMKDNKSFVMLTDTGYVSDRMTGIIENADGYLIESNHDIEILR 


180 


Query: 


181 


SGSYPWTLKQRILSDKGHLSNEDGSETMIRTIGNRTKHIYLGHLSKENNIKELAHMTMEN 


240 






SGSYPW+LKQRILSD GHLSNEDG+ MIR++G TK IYLGHLSKENNIKELAHMTM N 




Sb j ct : 


181 


SGSYPWSLKQRILSDLGHLSNEDGAGAMIRSLGYNTKKIYIJGHIlSKENNIKEIlAHMT^lVN 


240 


Query: 


241 


NLMRADFGVGTDFSVHDTSPDSATPLTRI 269 








L AD VGTDF+VHDTSPD+A PLT I 




Sb j ct : 


241 


QliAMADIAVGTDFTVHDTSPDTACPLTDI 269 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 362 

A DNA sequence (GBSx0393) was identified in S.agalactiae <SEQ ID 1175> which encodes the amino 
acid sequence <SEQ ID 1 176>. Analysis of this protein sequence reveals the following: 

Possible site: 26 

>>> Seems to have an uncleavable N-term signal seg 

INTEGRAL Likelihood =-11.94 Transmembrane 15 - 31 ( 5 - 34) 

Final Results 

bacterial membrane Certainty=0 . 5776 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1177> which encodes the amino acid 
sequence <SEQ ID 1178>. Analysis of this protein sequence reveals the following: 

Possible site: 26 

»> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 
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An alignment of the GAS and GBS proteins is shown below: 

Identities = 335/443 (75%) , Positives = 392/443 (87%) 

5 Query: 7 NIRSFEIALLFLLVFVAVYFVYLAVRDFKMSKNIRLLNWKVRDLIAGNYSDSI^ 66 

N+ +FELA+L LLVFVA YF++LAVRD++ ++ IR+++ K+RDLI G Y+D I + D + 
Sbjct: 8 NLSTFEIAILILLVFVAFYFIHLAVRDYRNARIIRMMSHKIRDLINGRYTDIIDEKADIE 67 

Query: 57 LWLGESIJTOLSDVFRMAHDNLEQEKNRIjASILTYMrDGvIiATDRSGKIVMINETAQQQF 126 
10 L+EL + LNDLSDVFR+ H+NL QEKNRLASIL YM+DGVLATDRSGKI+MINETA++Q 

Sbjct: 68 LMELSDQIOTLSDVFRLTHENIAQEKiroLASILAYMSDGVLATDRSGKIIMINETARKQL 127 

Query: 127 NLAYDEALSMNIVDMLGSGSPYSFQDLVSKTPEVvIJJRRD^^ 186 
NL+ +EAL NI D+L + Y+++DLVSKTP V +N R++ GEFV+LR+RFALNRRESG 
15 Sbjct: 128 ISTLSKEEALKKNITDLLEGDTSYTYRDLVSKTPVvTvNSRNDMGEWSLRLRFALNRRESG 187 

Query: 187 FISGLVAVSHDATEQEKEERERRLFVSOTSHELRTPLTSVKSYLEALDEGALNEEVAPSF 246 

FISGLV V HD TEQEKEERERRLFVSNVSHELRTPLTSVKSYLEALDEGAL E++APSF 
Sbjct: 188 FISGLVWLHDTTEQEKEERERRLFVSNVSHELRTPLTSVKSYLEALDEGALKEDIAPSF 247 

20 

Query: 247 IKVSLDElTOMMRMISDLLSLSRIDNEVTHLDVEMTNFTAFMTSIl^mFDQIRNQKTVTG 306 

IKVSLDETNRMMRMISDLL+LSRIDN+VT L VEMTNFTAF+TSILNRFD ++NQ T TG 
Sbjct: 248 IKVSLDETNRMMRMISDLLNLSRIDNQVTQIAVEMTNFTAFITSIIimFDLvraiQHTGTG 307 

25 Query: 307 KVYEIVRDYPLKSIWVEIDTDKMTQVIDNII^AVKYSPDGGKITVNLRTTKTQMILSIS 366 

KVYEIVRDYP+ S+W+EID DKMTQVI+NILNNA+KYSPDGGKITV ++TT TQ+I+SIS 
Sbjct: 308 ro/YEIVRDYPITSWIEIDNDKMTQVIENIMmiKYSPDGGKITVRMKTTDTQLIISIS 367 

Query: 367 DQGLGIPKKDLPLIFDRFYRVDKARSRKQGGTGLGLSIAKEIVKQHKGFIWAKSEYGKGS 426 
30 DQGLGIPK DLPLI FDRFYRVDKARSR QGGTGLGL+IAKEI+KQH GFIWAKS+YGKGS 

Sbjct: 368 DQGLGI PKTDLPLI FDRFYRVDKARSRAQGGTGLGLAIAKEIIKQHHGFIWAKSDYGKGS 427 

Query: 427 TFTIVLPYDKDAVTYEEWEDVKD 449 
TFTIVLPY+KDA YEEWE+ D 
35 Sbjct: 428 TFTIVLPYEKDAAIYEEWEEDVD 450 

A related GBS gene <SEQ ID 8561> and protein <SEQ ID 8562> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 8 
40 McG: Discrim Score: 8.59 

GvH: Signal Score (-7.5): -3.38 

Possible site: 26 
>» Seems to have an uncleavable N-term signal seq 
ALOM program count: 1 value: -11.94 threshold: 0.0 
45 INTEGRAL Likelihood =-11.94 Transmembrane 15 - 31 ( 5 - 34) 

PERIPHERAL Likelihood = 8.27 178 
modified ALOM score: 2.89 



50 



55 



60 



*** Reasoning Step: 3 



Final Results 

bacterial membrane Certainty=0. 5776 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

67.5/83.5% over 439aa 

Streptococcus pneumoniae 
GP| 5830524 | histidine kinase Insert characterized 

ORF01458(331 - 1647 of 1947) 

GP|5830524|emb|CAB54569.l| |AJ006392(10 - 449 of 449) histidine kinase {Streptococcus 

pneumoniae} 

%Match =45.6 
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%Identity =67.5 %Similarity = 83.4 

Matches = 297 Mismatches = 70 Conservative Sub.s = 70 

126 156 186 216 246 276 306 336 

5 ITSPFSDTYRTSHDTRTFIGNSLGI*LFWRCPYS*CDGETFT*KD*RYSWSSRIYFDSTWCRXIT*SLMNNSAANIRSFE 

I 

MLDLLKQTIFT 
10 

10 366 396 426 450 480 510 540 570 

LMjLFLLVFVAWFVYLAVRDFKMSKNIRL--LNWCT 

==l=l=== =1 = 11 = 1 =1 11=11111=11 = =11 = = = = =11111 = 1 1= = = 1111 

RDFIFILXLLGFILVVTLLLLENRRDNIQLKQW^ 

30 40 50 60 70 80 90 



15 



30 



600 630 660 690 720 750 780 810 

EKNRl^ILTYMTDGVIATDRSGKIVMINETAQQQ 



ESKRLNSILFYMTDGVIATTTORGQIIMIiroTAKKQLGLVKEDVL 
20 110 120 130 ' 140 150 160 170 

840 870 900 930 960 990 1020 1050 

FVTLRIRFAIjNRRESGFISGLVAVSHDATEQEKEERERRLFVSlWSHELRTPLTSVKSYLFALDEGAIJffiEVAPSFIKVS 

= = ii = iim mimiimi ii mmmmmmiiimiiiimimiim i m urn 

25 ■ YIJ^RWFALIRRESGFISGLVAVLHDTTEQEKEERERRLWSNVSHELRTPLTSVKSYLEALDEGALCETVAPDFIKVS 

190 200 210 220 230 240 250 



1080 1110 1140 1170 1200 1230 1260 1290 

LDETNRMMRMISDLLSLSRIDNEOTHLDVEMTNFTAFMTSILNRFDQIR 

iiiiiiiiii.-.-in nun =11111= 11111 = 1 iiiiii=== i i ii = iiim= 111 = 11111111 

LDETNRMMRMVTDLLHLSRIDNATSHLDVE^^ 

270 280 290 300 310 320 



1320 1350 1380 1410 1440 1470 1500 1530 

35 QVIDNIIiNNAVKYSPDGGKITVNLRTTKTQMILSISDQGLGIP 

ll=lllllll=lllllllllll ==ll= 11111111=111111=111 111111111=1111 11111111111111=1 
QVVDNILNNAIKYSPDGGKITVRMKTTEDQMILSISDHGLGIPKQDLPRIFDRFYRVDRARSRAQGGTGLGLSIAKEIIK 
340 350 360 370 380 390 400 

40 1560 1590 1620 1647 1677 1707 1737 1767 

QHKGFIWAKSEYGKGSTFTIVLPYDKDAVTYEEWED-VED*NMSEIGFKYSILASGSTGNCFYIETPQKRLLIDAGLTGK 

Illlllllllllllillllllllllllll I III III 
QHKGFIWAKSEYGKGSTFTIVLPYDKDAVKEEVWEDEVED 
420 430 440 

45 

SEQ ID 1176 (GBS41) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 4 (lane 7; MW 50kDa), in Figure 168 (lane 2-4; MW 65kDa - thioredoxin fusion) 
and in Figure 238 (lane 4; MW 65kDa). It was also expressed in E.coli as a GST-fusion product. SDS- 
PAGE analysis of total cell extract is shown in Figure 13 (lane 7; MW 75kDa). 

50 Purified Thio-GBS41 -His is shown in Figure 244, lane 1 0. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 363 

A DNA sequence (GBSx0394) was identified in S.agalactiae <SEQ ID 1179> which encodes the amino 
55 acid sequence <SEQ ID 1180>. This protein is predicted to be VicR protein (regX3). Analysis of this 
protein sequence reveals the following: 

Possible site: 60 

>>> Seems to have no N-terminal signal sequence 
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Final Results 

bacterial cytoplasm Certainty=0. 2754 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

5 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1181> which encodes the amino acid 
sequence <SEQ ID 1182>. Analysis of this protein sequence reveals the following: 

Possible site: 60 
10 >>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2754 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

15 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 205/236 (86%) , Positives = 221/236 (92%) 

20 Query: 1 MKKILIVDDEKPISDIIKFNLTKEGYETATAFDGRFALVQYAEFQPDLIILDLMLPELDG 60 

MKKILIVDDEKPISDIIKFNLTKEGY+ TAFDGREA+ + E +PDLI ILDLMLPELDG 
Sbjct: 1 MKKILIVDDEKPISDIIKFNLTKEGYDIVTAFDGREAVTIFEEEKPDLIILDLMLPELDG 60 

Query: 61 LEVAKETOKTSHIPIIMLSAKDSEFDKVIGLEIGADDyVTKPFSNRELLARVKAHLRRTE 120 
25 LEVAKE+RKTSH+PI IMLSAKDSEFDKVIGLEIGADDYVTKPFSNRELLARVKAHLRRTE 

Sbjct: 61 LEVAKEIRKTSHVPIIMLSAKDSEFDKVIGLEIGADDYVTKPFSNRELLARVKAHLRRTE 120 

Query: 121 NIETAVAEESAQNASSDITIGEI^ILPDAFIAKKRGEEIELTHREFELLHHIA 180 
IETAVAEE+A + + ++TIG LQILPDAF+AKK G+E+ELTHREFELLHHIA H+GQVM 
30 Sbjct: 121 TIETAVAEENASSGTQELTIGNLQILPDAWAKKHGQEVELTHREFELLHHLANHMGQVM 180 

Query: 181 TREHLLETWGYDYFGDTOTVDVTVRRLREKIEDTPGRPEYILTRRGVGYYMKSYE 236 

TREHLLE VWGYDYFGDVRTVDVTVRRLREKIEDTP RPEYILTRRGVGYYMKSY+ 
Sbjct: 181 TREHbLEIWGYDYFGDWTVDVTVRRLREKIEDTPSRPEYILTRRGVGYYMKSYD 236 

35 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 364 

A DNA sequence (GBSx0395) was identified in S.agalactiae <SEQ ID 1183> which encodes the amino 
40 acid sequence <SEQ ID 1184>. This protein is predicted to be amino acid ABC transporter, ATP-binding 
protein. Analysis of this protein sequence reveals the following: 

Possible site: 43 

>» Seems to have no N-termlnal signal sequence 

45 Final Results 

bacterial cytoplasm Certainty=0. 3791 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

50 The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB14701 GB:Z99118 glutamine ABC transporter (ATP-binding 
protein) [Bacillus subtilis] 
Identities = 149/244 (61%) , Positives = 200/244 (81%) , Gaps = 2/244 (0%) 

55 Query: 3 LISYKNVNKYYGDYHALRQINLEIEPGQVVVLLGPSGSGKSTLIRTMNALESIDDGSLVV 62 

+I+++NVNK+YGD+H L+QINL+IE G+VW++GPSGSGKSTL+R +N LESI++G L V 
Sbjct: 1 MITFQNVNKHYGDFHVLKQINLQIEKGEvWIIGPSGSGKBTLLRCINRLESINEGVLTV 60 
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Query: 63 NGHELMOSSKELVlttRKEVGIWFQHENLYPHKTV^ 122 

NG + N ++ +R+ +GMVFQHF+LYPHKTVL+NI LAP+KVL+QS ++A E A 
Sbjct: 61 NGTAI-ISTORKTDINQWQNIGIWFQHFHLYPHKXVLQNIMLAPVKVLRQSPEQAKETARY 119 

5 Query: 123 YLKFVNMWERKDSYPSMLSGGQKQRIAIARGLAMHPKLLLFDEPTSALDPETIGDVLSVM 182 

YL+ V + ++ D+YPS LSGGQ+QR+AIARGLAM P+++LFDEPTSALDPE IG+VL VM 
Sbjct: 120 YLEKVGIPDKADAYPSQLSGGQQQRVAIARGLAMKPEVMLFDEPTSALDPEMIGEVLDVM 179 

Query: 183 QKLANDGMNMWVTHEMGFAREVADRI I FMADGE I LVDTTDVQDFFDNPREPRAKQFLSN 242 
10 + LA +GM MWVTHEMGFA+EVADRI+F+ +G+IL + +F+ NP+E RA+ FLS 

Sbjct: 180 KTLAKEGMTMWVTHEMGFAKEVADRI VF IDEGKILEEAVPA- EFYANPKEERARLFLSR 238 

Query: 243 IINH 246 
I+NH 

15 Sbjct: 239 ILNH 242 

A related DNA sequence was identified in S. pyogenes <SEQ ID 1185> which encodes the amino acid 

sequence <SEQ ID 1 186>. Analysis of this protein sequence reveals the following: 

Possible site: 51 
20 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3763 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

25 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 131/243 (53%) , Positives = 179/243 (72%) , Gaps = 2/243 (0%) 

30 Query: 2 SLISYKNVNKYYGDYHALRQINLEIEPGQVWLIjGPSGSGKSTLIRTMNALESIDDGSLV 61 

++IS K+++KYYG L+ I+L+I PG+WV++GPSGSGKSTL+RTMN LE G + 
Sbjct: 5 AIISIKDLHKYYGHNEVLKGIDLDIMPGEVVVIIGPSGSGKSTLLRTMNLLEVPTKGQIR 64 

Query: 62 WGHEI^ISSKELVmRKEVGMVFQHFNLYPHKTVLENITIAPIKOTiKQSKKEAMEIAE 121 
35 G ++ + ++ ++R+++GMVFQ FNL+P+ T+LENITL+PIK +K EA + A 

Sbjct: 65 FEGIDITD-KKNDIFS^TOEKMG^IVFQQFNLFPNMTILENITLSPIKTKGMAKAEADKTAL 123 

Query: 122 KYLKFVNMWERKDSYPSMLSGGQKQRIAIARGLAMHPKLLLFDEPTSALDPETIGDVLSV 181 
L V + E+ +YP+ LSGGQ+QRIAIARGLRM P +LLFDEPTSALDPE +G+VL+V 
40 Sbjct: 124 SLLDKVGLSEKAKAYPASLSGGQQQRIAIARGLAMDPDVLLFDEPTSALDPEMVGEVLAV 183 

Query: 182 MQKLANaSMNIWVVTHEMGFAREVADRIIFMADGEILVDTTDVQDFFDNPREPRAKQFLS 241 

MQ LA GM MV+VTHEMGFA+EVADR++FM DG ++V+ FD +E R K FLS 

Sbjct: 184 MQDLAKSGMTMVIVTHEMGFAKEVADRVMFM-DGGVIVEEGSPNQLFDLTKEERTKDFLS 242 

45 

Query: 242 Nil 244 
++ 

Sbjct: 243 RVL 245 

50 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 365 

A DNA sequence (GBSx0396) was identified in S.agalactiae <SEQ ID 1187> which encodes the amino 
acid sequence <SEQ ID 1188>. This protein is predicted to be glutamine-binding. Analysis of this protein 
55 sequence reveals the following: 

Possible site: 27 

»> Seems to have a cleavable N-term signal seq. 



Final Results 
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bacterial outside Certainty=0 . 3000 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

5 The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB73178 GB:AL139076 probable ABC-type atnino-acid transporter 
periplasmic solute -binding protein [Campylobacter 
jejuni] 

Identities = 99/240 (41%) , Positives = 141/240 (58%) , Gaps = 3/240 (1%) 

10 

Query: 1 MLRRKRLTFYLLSCIFIFLLFYPNSTSANQLSEIKKSGVLKVGVKQDVPNFGYYNAETNQ 60 

M+ RK L + + + F + + +L IK G L VGVK DVP++ + T + 
Sbjct: 1 MVFRKSLLKLAVFALGACVAFSNANAAEGKLES I KSKGQLI VGVKND VPHYALLDQATGE 60 

15 Query: 61 YEGMEIDIAKKIAKSL GVKPVFVPTTAQTREPLMDNGQIDILIATYTITPERKANYN 117 

+G E+D+AK +AKS+ K V A+TR PL+DNG +D +IAT+TITPERK YN 

Sbjct: 61 IKGFEVDVAKLLAKSILGDDKKIKLVAVNAKTRGPLLDNGSVDAVIATFTITPERKRIYN 120 

Query: 118 ISKAYYHDEIGFLTOKNSHIKTIKELDGKHIGVAQGATTKVNLEKYAKEHKLKFSYAQLG 177 
20 S+ YY D IG LV K K++ ++ G +IGVAQ ATTK + + AK+ + +++ 

Sbjct: 121 FSEPYYQDAIGLLVLKEKKYKSLADMKGANIGVAQAATTKKAIGEAAKKIGIDVKFSEFP 180 

Query: 178 SFPEIAISLYANRIDAFSVDKSILSGYLSPHTTILKEGFNTQEYGIATSKQDKVLIPYVN 237 
+P + +L A R+DAFSVDKSIL GY+ + IL + F Q YGI T K D YV+ 
25 Sbjct: 181 DYPS I KAALDAKR VDAFSVDKS I LLGYVDDKSEILPDS FEPQSYGI VTKKDDPAFAKYVD 240 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1189> which encodes the amino acid 
sequence <SEQ ID 1 190>. Analysis of this protein sequence reveals the following: 

Possible site: 30 
30 >>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -6.16 Transmembrane 17 - 33 ( 15 - 35) 

Final Results 

bacterial membrane Certainty=0. 3463 (Affirmative) < suco 

35 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

A related sequence was also identified in GAS <SEQ ID 9097> which encodes the amino acid sequence 
<SEQ ID 9098>. Analysis of this protein sequence reveals the following: 

40 >>> May be a lipoprotein 

Final Results 

bacterial membrane Certainty= 0.000 (Not Clear) < suco 

bacterial outside Certainty= 0.000 (Not Clear) < suco 

45 bacterial cytoplasm Certainty= 0 . 000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 66/251 (26%) , Positives = 111/251 (43%) , Gaps = 27/251 (10%) 

50 Query: 23 PNSTSANQLSEIKKSGVLKVGVKQDVPNFGYYNAETNQYEGMEIDIAKKIAKSLGVKPVF 82 

P+ + + IK+ GVLKV +YN + N+ G E+D+ K+I K L +K F 

Sbjct: 34 PHQSQKSSWDT I KEKG VLKVATPGTYQPTSFYN-DNNELVGYEVDMVKE IGKRLNI KVKF 92 

Query: 83 VPTTAQTREPLMDNGQIDILIATYTITPERKANYNISKAYYHDEIGFLVR KNSHIK 138 

55 V T +D+G++DI + + ITP+R+ YNIS Y + G +VR N K 

Sbjct: 93 VETGFDQAFTSVDSGRvDISI™FDITPKRQKKYNISTPYKYGVGGMIVRADGSSNIAKK 152 

Query: 139 TIKELDGKHIGVAQGATTKVNLEKYAKEHKLKFSYAQLGSFPELAISLYANRI 191 

+ + GK AG +K A+L ++ + +Y N + 

60 Sbjct: 153 DLSDWKGKKAAGASGTEYMKVAQKQG AELVTYDNVTGDVYLNDVANGRTDF 203 



Query: 192 - - DAFSVDKS ILSGYLSPHTTILKE GFNTQEYGIATSKQDKVL I PYVNKLLVSWEK 245 
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+ + K + LS + + + +N E GI +K+D L ++ ++ K 

Sbjct: 204 I PNDYPAQKLFVDYMLSQNPNliNVKMSDVQYN^ KDMI K 263 

Query: 246 DGSLKHIYQKF 256 
5 DGSLK I + + 

Sbjct: 264 DGSLKKISETY 274 

SEQ ID 1188 (GBS136) was expressed in E.coli as a His-tusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 29 (lane 5; MW 29.9kDa). 

10 The GBS136-His fusion product was purified (Figure 200, lane 6) and used to immunise mice. The 
resulting antiserum was used for FACS (Figure 284), which confirmed that the protein is immunoaccessible 
on GBS bacteria. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

15 Example 366 

A DNA sequence (GBSx0397) was identified in S.agalactiae <SEQ ID 1191> which encodes the amino 
acid sequence <SEQ ID 1192>. This protein is predicted to be integral membrane. Analysis of this protein 
sequence reveals the following: 

Possible site: 55 

20 >>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -9.34 Transmembrane 32 - 48 ( 27 - 55) 

INTEGRAL Likelihood = -5.04 Transmembrane 200 - 216 ( 196 - 219) 

INTEGRAL Likelihood = -3.13 Transmembrane 93 - 109 ( 93 - 113) 

INTEGRAL Likelihood = -2.02 Transmembrane 74 - 90 ( 74 - 92) 

25 



30 



35 



Final Results 

bacterial membrane Certainty=0 . 4736 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB73177 GB:AL139076 putative ABC-type amino-acid transporter 
permease protein [Campylobacter jejuni] 
Identities = 112/226 (49%) , Positives = 160/226 (70%) , Gaps = 3/226 (1%) 

Query: 5 NISPFAISRWGAFFNHFDLFFKGFLYTLGISFGALLLALILGILSGGLSTSKSKVGKLIS 64 

+ISPFA+ ++ ++ D F GF+YTL +S ALL+A I G + G ++TS+ K+ + + 
Sbjct: 25 SISPFAVWKFLDALDNKDAFINGFIYTLEVSILALLIATIFGTIGGVMATSRFKIIRAYT 84 

40 Query: 65 RIYVEVFQNTPLLVQMVFVYYGLAIISNGHVMISAFFTAVLCVGLYHGAYISEVIRSGIE 124 
RIYVE+FQN PL++Q+ F++Y L ++ + + F VL VG YHGAY+ SEV+RSGI 
Sbjct: 85 RIYVELFQNVPLVIQIFFLFYALPVLG IRLDIFTIGVLGVGAYHGAYVSEWRSGIL 141 

Query: 125 AVPKGQTEAALAQGFTANQTMQL 1 1 LPQATOTILPPMTNQVVNLI KNTSTVAI I SGADIM 184 
45 AVP+GQ EA+ +QGFT Q M+ II+PQ +R ILPPMTNQ+VNLIKNTS +1+ GA++M 

Sbjct: 142 AVPRGQFFJ^ASCGFTYIQQMRYIIVPC/TIRIILPPMTNQMVNLIKNTSVLLIVGGAELM 201 

Query: 185 FVAKAWAYDTTNYIPAFAGAAIFYFVICFPLASWARKQEELNKKTY 230 
A ++A D NY PA+ AA+ YF+IC+PLA +A+ E KK + 
50 Sbjct: 202 HSADSYAADYGNYAPAYI FAAVLYFI I CYPLAYFAKAYENKLKKAH 247 

A related DNA sequence was identified in S. pyogenes <SEQ ID 1193> which encodes the amino acid 
sequence <SEQ ID 1194>. Analysis of this protein sequence reveals the following: 

Possible site: 28 
55 >» Seems to have a cleavable N-term signal seq. 
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INTEGRAL Likelihood = -6.26 Transmembrane 307 - 323 ( 303 - 327) 

INTEGRAL Likelihood = -5.89 Transmembrane 485 - 501 ( 479 - 502) 

INTEGRAL Likelihood = -1.12 Transmembrane 375 - 391 ( 375 - 391) 



5 Final Results 

bacterial membrane Certainty=0. 3506 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

10 The protein has homology with the following sequences in the databases: 

>GP:BAA17584 GB.-D90907 glutamine -binding periplasmic protein 
[Synechocystis sp.] 

Identities = 146/532 (27%) , Positives = 244/532 (45%) , Gaps = 59/532 (11%) 

15 Query: 6 YMKKLILSCLVALALLFGGMSRAQANQYLRVGMEAAYAPFNWTQDDASNGAVPIEGTSQY 65 
Y L L L+A+A+ +Q + VE+PFT ETQ 
Sbjct: 16 YYLLLALGVLLAIAIPLLPAFSQVSRQTIIVATEPTFPPFEMTD EATGQL 65 

Query: 66 ANGYDVQVAKKVAKAMNKELLVVKTSWTGLIPALTSGKIDMIAAGMSPTKERRNEISFSN 125 
20 G+DV + + + +A + + + G+IPAL S + + ++ T ER +SFS+ 

Sbjct: 66 T-GFDVDLIQAIGFiAAQVTVDIQGYPFDGIIPALQSNTVGAAISAITITPERAQSVSFSS 124 

Query: 126 SSYTSQPVLVVTANGKYADATSLKDFSGAKOTAQQ^WHVNLLTQLKGAKLQTPMGDFSQ 185 
+ S VL + +LKD G ++ G + T + GAK+ T + 

25 Sbjct: 125 PYFKS--VIAIAVQDGNDTIKNLKDLEGKRI^VAIGTTGAMVATNVPGAKV-TNFDSITS 181 

Query: 186 MRQALTSGVIDAYISERPEAMTAEAADSRLKMITLKKGFAVAESDARIAVGMKKNDDRMA 245 

Q L +G DA I++RP + A D+ L+ + + +E IA+ + + 
Sbjct: 182 ALQELVNGNADAVINDRPVLLYA-IKDAGLRNVKISADVG-SEDYYGIAMPLAPPGE 236 

30 

Query: 246 TVNQ VLEGFSQTDRMALMDDMVTKQPVE KKAEDAKAS FLGQMWAI FKGN - - 294 

+NQ E +Q ++++ EK + FJj + G 

Sbjct: 237 - INQTREVLNQ-GLFQI IENGTYNAIYEKWFGEKNPPFLPLVAPSLVGKVGTAQSLTERS 294 

35 Query: 295 WKQFLRGTGMTLLISMVGTITGLFIGLLIGIFRTAPKAKHKVAALGQK 342 

++ +G+ +T+L++ GL G + I + K 
Sbjct: 295 QANPNDNFLITLFRNLFKGSILTVLLTAFSVFFGLIGGTGVAIALISDI K 344 

Query: 343 LFGWLLTIYIEIFRGTPMIVQSMVIYYGTAQAF GISIDRTLAAIFIVSINTGAYM 397 

40 + IY+E FRGTPM+VQ +IY+G F GI+IDR AAI +S+N AY+ 

Sbjct: 345 PLQLIFRIYVEFFRGTPMLVQLFIIYFGLPALFKEIGLGITIDRFPAAIIALSLNVAAYL 404 

Query: 398 SEIVRGGIFAVDKGQFKAATALGFTHGQTMRKIVLPQWRNILPATGNEFVINIKDTSVL 457 
+EI+RGGI ++D+GQ++A +LG + QTM++++ PQ R ILP GNEF+ IKDTS+ 
45 Sbjct: 405 AEIIRGGIQSIDQGQWEACESLGMSPWQTMKEVIFPQAFRRILPPLGNEFITLIKDTSLT 464 

Query: 458 NVISWELYFSGNTVATQTYQYFQTFTIIAIIYFVLTFTVTRILRYIERRFD 509 

VI EL+ G + TY+ F+ + +A++Y +LT + + +++E D 
Sbjct: 465 AVIGFQELFREGQLIVATTYRAFEVYIAVALVYLLLTTISSFVFKWLENYMD 516 

50 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 82/210 (39%) , Positives = 113/210 (53%) , Gaps = 12/210 (5%) 

Query: 14 WGAFFNHFDLFFKGFLYTLGISFGALLLALILGILSGGLSTS KSKVGKL 1 63 

55 W F ++ F +G TL IS + L +G+L G T+ K KV L + 

Sbjct: 288 WAIFKGNWKQFLRGTGMTLLISMVGTITGLFIGLLIGIFRTAPKAKHKVAALGQKLFGWL 347 

Query: 64 SRIYVEVFQNTPLLVQMVFVYYGLAIISNGHVMISAFFTAVLCVGLYHGAYISEVIRSGI 123 
IY+E+F+ TP++VQ + +YYG A +1 A+ V + GAY+SE++R GI 

60 Sbjct: 348 LTIYIEIFRGTPMIVQSMVIYYGTAQAFG--ISIDRTLAAIFIVSINTGAYMSEIVRGGI 405 

Query: 124 EAVPKGQTFjyUAQGFTANQTMQLIILPQAWTILPPMTNQVvNLIKNTSTVAIISGADI 183 

AV KGQ +AA A GFT QTM+ I+LPQ VR ILP N+ V IK+TS + +IS ++ 
Sbjct: 406 FAVDKGQFKAATALGFTHGQTMRKIVLPQVVRNILPATGNEWINIKDTSVIJWISVVEL 465 



65 



Query: 184 MFVAKAWAYDTTNYIPAFAGAAIFYFVTCF 213 
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F A T Y F AI YFV+ F 

Sbjct: 466 YFSGNTVATQTYQYFQTFTI IAI I YFVLTF 495 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
5 vaccines or diagnostics. 

Example 367 

A DNA sequence (GBSx0398) was identified in S.agalactiae <SEQ ID 1195> which encodes the amino 
acid sequence <SEQ ID 1196>. This protein is predicted to be amino acid ABC transporter, permease 
protein. Analysis of this protein sequence reveals the following: 

10 Possible site: 39 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -6.95 Transmembrane 25 - 41 ( 16 - 42) 

INTEGRAL Likelihood = -3.61 Transmembrane 66 - 82 ( 65 - 86) 

INTEGRAL Likelihood = -2.44 Transmembrane 184 - 200 ( 182 - 201) 

15 INTEGRAL Likelihood = -0.59 Transmembrane 119 - 135 ( 119 - 135) 

Final Results 

bacterial membrane — Certainty=0 . 3781 (Affirmative) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

20 bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB14704 GB:Z99118 glutamine ABC transporter (integral membrane 
protein) [Bacillus subtilis] 
25 Identities = 84/206 (40%) , Positives = 129/206 (61%) , Gaps = 6/206 (2%) 

Query:, 10 ILFLLQGFGLTLYISFISILLSMFFGTLLAIMRNSKNPIWKLIASIYIEFVRNVPNLLWI 69 

+ FL GF +TLY++FISI+LS FFG + +R +K P+ + ++ +E +RN+P LL I 
Sbjct: 12 IAFLWDGFLVTLYVAFISIILSFFFGLIAGTIjRYAKVPVLSQLIAVLVETIRNLPLLLII 71 

Query: 70 FIIFLVF QMKSVSAGITSFTIFTSAALAEIIRGGLNGVDKGQTEAGLSQGFTYLQ 124 

F F +++ +A IT+ TIF SA L+EIIR GL +DKGQ EA S G +Y Q 

Sbjct: 72 FFTFFALPEIGIKLEITAAAITALTIFESAMLSEIIRSGLKSIDKGQIEAARSSGLSYTQ 131 

35 Query: 125 VFI III FPQAFRKMLPAI I SQFVTVI KDTSLLYSVIAI QEI FGKSQI LMGRYFEAGQVFT 184 

1+ PQA R+M+P I+SQF++++KDTSL VIA+ E+ +QI+ G+ + F 
Sbjct: 132 TLFFIVMPQALRRMVPPIVSQFISLLKDTSIAV-VIALPELIHNAQIINGQSADGSYFFP 190 

Query: 185 LYAIITAVYFITNFIISSFSRKLSKR 210 
40 ++ + +YF N+ +S +R+L R 

Sbjct: 191 IFLLAALMYFAVNYSLSLAARRLEVR 216 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1197> which encodes the amino acid 
sequence <SEQ ID 1198>. Analysis of this protein sequence reveals the following: 

45 Possible site: 20 

»> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood =-10.51 Transmembrane 529 - 545 ( 517 - 551) 

INTEGRAL Likelihood =-10.30 Transmembrane 697 - 713 ( 693 - 719) 

INTEGRAL Likelihood = -4.41 Transmembrane 560 - 576 ( 555 - 585) 

50 INTEGRAL Likelihood = -0.32 Transmembrane 662 - 678 ( 662 - 678) 

Final Results 

bacterial membrane Certainty=0. 5203 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

55 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



30 



The protein has homology with the following sequences in the databases: 

>GP:BAA17584 GB:D90907 glutamine -binding periplasmic protein 
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[Synechocystis sp.] 

Identities = 153/475 (32%) , Positives = 251/475 (52%) , Gaps = 27/475 (5%) 

Query: 273 IVSDSSFAPFEFQN-GKGKWGIDIEL1KAIAKQQGFKIEIANPGFDAALNAVQSSQADG 331 
5 + ++ +F PFE + G+ G D++LI+AI + ++1 FD + A+QS+ 

Sbjct: 46 VATEPTFPPFEMTDEATGQLTGFDVDLIQAIGEAAQVTVDIQGYPFDGIIPALQSNTVGA 105 

Query: 332 VIAGATITDARKAIFDFSDPYYTSNI ILAVKAGKN- IKNYEDLDRKTVGAKNGTSSYSWL 390 
1+ TIT R FS PY+ S + +AV+ G + IKN +DL+ K + GT+ + + 

10 Sbjct: 106 AISAITITPERAQSVSFSSPYFKSVI1AIAVQDGNDTIKNLKDLEGKRI1AVAIGTTG-AMV 164 

Query: 391 KENAPKYGYNVKAFDDGSSMYDSLNSGSVDAIMDDEAVLKYAISQG- -RRFETPLEGI ST 448 

N P G V FD +S L +G+ DA+++D VL YAI R + + S 

Sbjct: 165 ATNVP - - GAKVTNFDS ITSALQEL VNGNADAVINDRP VIiLYAI KDAGLRNVKI SADVGSE 222 

15 

Query: 449 GEVGFAVKKGTNPELI EMFNNGLAALKKSGQYDDIIDKYLDSKKA ATPSEKG 500 

G A+ E+ E+ N GL + ++G Y+ I +K+ K PS G 

Sbjct: 223 DYYGIAMPLAPPGEINQTREVLNQGLFQIIENGTYNAIYEKWFGEKNPPFLPLVAPSLVG 282 

20 Query: 501 ADESTISGLLSNNYKQLLAGLGTTLSLTLISFAIAIIIGIIFGMMAVSP 549 

+ + L ++ L G T+ LT S +1 G + +S 

Sbjct: 283 KVGTAQSLTERSQANPNDNFLITLFRNLFKGSILTVLLTAFSVFFGLIGGTGVAIALISD 342 

Query: 550 TKSLRLISTVFVDVVRGIPLMIVAAFIFWGVPNLIESMTGHQSPINDFLAATIALSLNGG 609 
25 K L+LI ++V+ RG P+++ I++G+P L + + G 1+ F AA IALSLN 

Sbjct: 343 IKPLQLIFRIYVEFFRGTPMLVQLFIIYFGLPALFKEI-GLGITIDRFPAAIIALSI^NVA 401 

Query: 610 AYIAEIVRGGIEAVPAGQMEASRSLGLSYGTTMRKVILPQAVKLMLPNFINQFVISLKDT 669 
AY+AEI+RGGI+++ GQ EA SLG+S TM++VI PQA + +LP N+F+ +KDT 
30 Sbjct: 402 AYLAEIIRGGIQSIDQGQWEACESLGMSPWQTMKEVIFPQAFRRILPPLGMEFITLIKDT 461 

Query: 670 T I VSAIGLVELFQTGKI 1 I ARNYQSFRMYAILAI I YLIMI ILLTRLAKRLEKRLN 724 

++ + IG ELF+ G++I+A Y++F +Y +A++YL++ + + + K LE ++ 
Sbjct: 462 SLTAVIGFQELFREGQLIVATTYRAFEVYIAVALVYLLLTTISSFVFKWLENYMD 516 
35 Identities = 68/247 (27%) , Positives = 106/247 (42%) , Gaps = 11/247 (4%) 

Query: 7 VLLLAIMSIFLTCNIASAETIAIVSDTAYAPFEFKD--SDQIYKGIDVDIINEVAKRQSW 64 

VLL + + + S +TI + ++ + PFE D + Q+ G DVD+I + + 
Sbjct: 24 VLLAIAIPLLPAFSQVSRQTIIVATEPTFPPFEMTDEATGQL-TGFDVDLIQAIGEAAQV 82 

40 

Query: 65 DFSMSFPGFDAAWAVQSGQASALMAGTTITNARKKVFHFSEPYYDTKIVIATRKAN-AI 123 

+ FD + A+QS A ++ TIT R + FS PY+ + + IA + N I 
Sbjct: 83 TVDIQGYPFDGIIPALQSJSITVGAAISAITITPERAQSVSFSSPYFKSVLAIAVQDGNDTI 142 

45 Query: 124 KKYSDLKGKTVGVKNGTAAQAFLISlNYKKKYDYTVKTFDTGDLMYNSLSAGSIAAVMDDEA 183 
K DL+GK + V GT N V FD+ I, G+ AV++D 

Sbjct: 143 KNLKDLEGKRIAVAIGTTGAMVATNVP GAKVTNFDSITSALQELVNGNADAVINDRP 199 

Query: 184 VIQYAIS QNQDIAINMKGEPIGSFGFAVKKGSGYDYLVNDFNTALKAMKADGTYQA 239 

50 V+ YAI +N 1+ ++ E + + N L + +GTY A 

Sbjct: 200 VLLYAIKDAGLRWKISADVGSEDYYGIAMPLAPPGEINQTREVLNQGLFQIIENGTYNA 259 

Query: 240 IMTKWLG 246 
I KW G 

55 Sbjct: 260 IYEKWFG 266 

An alignment of the GAS and GBS proteins is shown helow: 

Identities = 68/210 (32%) , Positives = 113/210 (53%) , Gaps = 16/210 (7%) 

60 Query: 13 LLQGFGLTLYISFISILLSMFFGTLl^IMRNSKNPIWKLIASIYIEFVRNVPNLLWIFII 72 

LL G G TL ++ IS +++ G + +M S +LI++++++ VR +P ++ I 

Sbjct: 517 LLAGLGTTLSLTLISFAIAIIIGIIFGMMAVSPTKSLRLISTVFVDWRGIPLMIVAAFI 576 

Query: 73 F LVFQMKSVSAGITSFTIFT SAALAEIIRGGLNGVDKGQTEAGLSQGF 120 

65 " FL+M+IFT A +AEI+RGG+ V GQ EA S G 

Sbjct: 577 FWGVPNLIESMTGHQSPINDFLAATIALSLNGGAYIAEIVRGGIEAVPAGQNEASRSLGL 636 
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Query: 121 TYLQVFIIIIFPQAFRKMLPAIISQFVTVIKDTSLLySVIAIQEIFGKSQILMGRYFEAG 180 

+Y +1 PQA + MLP I+QFV +KDT+++ S I + E+F R + 
Sbjct: 637 SYGTTMRKVILPQAVKLMLPNFINQFVISLKDTTIV-SAIGLVELFQTGKIIIARNY 692 

5 Query: 181 QVFTLYAIITAVYFITNFIISSFSRKLSKR 210 

Q F +YAI+ +Y I +++ +++L KR 
Sbjct: 693 QSFRMYAILAIIYLIMIILLTRLAKRLEKR 722 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
10 vaccines or diagnostics. 

Example 368 

A DNA sequence (GBSx0399) was identified in S.agalactiae <SEQ ID 1199> which encodes the amino 
acid sequence <SEQ ID 1200>. Analysis of this protein sequence reveals the following: 

Possible site: 39 
15 »> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-12.21 Transmembrane 7 - 23 ( 1-30) 

Final Results 

bacterial membrane Certainty=0 . 5883 (Affirmative) < suco 

20 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB04094 GB:AP001508 unknown conserved protein in B. subtilis 
25 [Bacillus halodurans] 

Identities = 43/157 (27%) , Positives = 83/157 (52%) , Gaps = 9/157 (5%) 

Query: 26 YQSQFQKTTNQA1AIAYKDAKVAKK--DVIHQKIDKEFENFRGSYEIEFNTKSAEYSYHV 83 
+Q++ N+ L +A ++ + + + +K+ +N R YEIE EY + + 

30 Sbjct: 38 HQAESVSADNEGLTIAEASDIALERAGNGVVTEAEKDRDNGRVVYEIEVKNDDDEYDFKI 97 



35 



Query: 84 DVKTGQILERDMDNNGFSKSTSQSSSSSSQKSHKISQEEAKKIAFKDANIEESEVSNLKI 143 

D +TG+IL+ + SK SSS ++ IS +EAK+IA K+ + ++ ++++ 
Sbjct: 98 DQQTGEILKEKQEQRKGSKPREGHSSSKGSEA-VISMDEAKEIALKEVS GKIDDIEL 153 

Query: 144 KEEIENGKSVYDIDF - VDLKNKNEVDYQIDAETGKI I 179 

E ENG VY+++ D + ++V +DA TG ++ 
Sbjct: 154 --ERENGSLvYEVEIESDHYDDDDVTvYVDAMTGNVL 188 

40 A related DNA sequence was identified in S. pyogenes <SEQ ID 1201> which encodes the amino acid 
sequence <SEQ ID 1202>. Analysis of this protein sequence reveals the following: 

Possible site: 57 

>>> Seems to have no N-terminal signal sequence 
45 INTEGRAL Likelihood = -5.15 Transmembrane 42 - 58 ( 41 - 60) 

Final Results 

bacterial membrane Certainty=0. 3060 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

50 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
An alignment of the GAS and GBS proteins is shown below: 



55 



Identities = 37/96 (38%) , Positives = 63/96 (65%) , Gaps = 5/96 (5%) 

Query: 94 DMDNNGFSKSTSQSSSSSSQKSHKISQEEAKKIAFKDANIEESEVSNLKIKEEIENGKSV 153 

DMD+ +Q +S + K K+S+++AK IA KDA++ E++ L + ++ E+GK+V 

Sbjct: 59 DMDDKD-DHI^NQPKTSQTSKKVKLSEDKAKSIALKDASVTEADAQMLSVTQDNEDGKAV 117 
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Query: 154 YDIDFVDLKNKN-EVDYQIDAETGKIIERSRDHMND 188 

Y+I+F +NK+ E Y IDA +G I+E+S + +ND 
Sbjct: 118 YEIEF---QNKDQEYSYTIDANSGDIVEKSSEPIND 150 
Identities = 23/62 (37%) , Positives = 37/62 (59%) 



Query: 35 NQALAIAYKDAKVAKKDVIHQKIDKEFENPRGSYEIEFNTKSAEYSYHVDVKTGQILERD 94 

++A +IA KDA V + D + ++ E+ + YEIEF K EYSY +D +G I+E+ 

Sbjct: 85 DKAKSIALKDASVTEADAQMLSVTQDNEDGKAVYEIEFQNKDQEYSYTIDANSGDIVEKS 144 

Query: 95 MD 96 
+ 

Sbjct: 145 SE 146 



A related GBS gene <SEQ ID 8563> and protein <SEQ ID 8564> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 9 
McG: Discrim Score: 14.45 
GvH: Signal Score (-7.5): -5.92 

Possible site: 39 
>>> Seems to have an uncleavable N-term signal seq 
ALOM program count: 1 value: -8.92 threshold: 0.0 

INTEGRAL Likelihood = -8.92 Transmembrane 7 - 23 ( 2-28) 
PERIPHERAL Likelihood = 10.93 37 
modified ALOM score: 2.28 



*** Reasoning Step: 3 



Final Results 

bacterial membrane -■ 
bacterial outside -• 
bacterial cytoplasm -• 



- Certainty=0. 4567 (Affirmative) < suco 

- Certainty=0. 0000 (Not Clear) < suco 

- Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 



26.1/59.2% over 140aa 
Bacillus subtilis 

EGAD | 107494 | hypothetical protein Insert characterized 
GP|2632048|emb|CAA05607.l| |AJ002571 YkoJ Insert characterized 

GP| 2633682 |emb|CAB13185.l] |Z99110 similar to hypothetical proteins from B. 
Insert characterized 

PIR| F69859 | F69859 conserved hypothetical protein ykoJ - Insert characterized 



subtilis 



ORF00925(379 - 852 of 1164) 

EGAD|l07494|BS1329(29 - 169 of 170) hypothetical protein {Bacillus subtilis} 
GP|2632048|emb|CAA05607.l| |AJ002571 YkoJ {Bacillus subtilis} 

GP|2633682|emb|CAB13185.l| |Z99110 similar to hypothetical proteins from B. subtilis 
{Bacillus subtilis} PIR| F69859 | F69859 conserved hypothetical protein ykoJ - Bacillus 
subtilis 
%Match =6.2 

%Identity =26.1 %Similarity =59.2 

Matches = 37 Mismatches = 52 Conservative Sub.s = 47 



297 327 357 387 417 447 468 498 

NIIE**KEGCCMIKKNKVFLEVDLvLWILEGGVLFYQSQFQKTTNQA^ 

I :| I == :: === |= ::|> I |: : ||:>: | s 

MLKIQ<MWGLIAGCLAAGGFSYNAFATENNENRQASSKTDALTEQEAEAIAKTVVDGTVEDIDRDLYNG^ 
10 20 30 40 50 60 70 



528 558 588 618 648 672 702 732 

SYEIEFNTKSAEYSYHVDVKTGQILERDMDNNGFSKSTSQSSSSSSQKSHK- - ISQEEAKKIAFKDANIEESEVSNLKIK 



VYEVEIEKEGEDYDVYVDIHTKQALNDPL KEKAEQVAITKEEAEEIALKQTG- - -GTVTESKLD 

90 100 110 120 130 
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762 792 822 852 882 912 942 972 

EEIENGKSVYDIDFVDLKNKtffiVDYQIDAETGKIIERSRDH^IND*FK*DIKKRRSKRPSF*LLSSLLPTF*KFT*KT*DD 

|: :| :|::: = I I = = = l h l = lh= I 
ED- -DGAYIYEME- IQTKQGTETEFEISAKDGRIIKQEIDD 
140 150 160 170 

SEQ ID 8564 (GBS37) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 14 (lane 4; MW 22kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 16 (lane 10; MW 47kDa). 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 369 

A DNA sequence (GBSx0400) was identified in S.agalactiae <SEQ ID 1203> which encodes the amino 
acid sequence <SEQ ID 1204>. Analysis of this protein sequence reveals the following: 

Possible site: 53 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1499 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9709> which encodes amino acid sequence <SEQ ID 9710> 
was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1205> which encodes the amino acid 
sequence <SEQ ID 1206>. Analysis of this protein sequence reveals the following: 

Possible site: 42 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2808 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 128/297 (43%) , Positives = 180/297 (60%) , Gaps = 9/297 (3%) 

Query: 54 IDDIKVGSPIFKYFWT-SLSLQAPLKALEFVLEQAKMPTELSGELSETQYLVAQFSDELA 112 

I D ++GSP F W Q+ + L F+L+ +MP ELSG+L ETQ L+ +F L 

Sbjct: 46 IIDNRLGSPTFWVIWPIEKENQSAKQLLTFLLDLVEMPFELSGQLHETQTLLTRFHPSLL 105 

Query: 113 PHDDFWIALSQVIYDSFPGNSIAEDTVIjNRKLHQFRYLISSQQAQYVRRYFKDVGMTDRD 172 

P FW L+ ++ +FPG +L++ L ++LHQFRY+ IS SQQAQ +R ++K + MTD 
Sbjct: 106 PDHMFWKELASLVDQAFPGKTLSQAGELEKRLHQFRYVISSQQAQSIRNHYKMIEMTDAQ 165 

Query: 173 ALVNYL SCL-REPDSIAYYESARLHNKRRRNGEIFGFPDDEPVINSK1LISFHTE 226 

AL +L CL R+ +SARLHNK R FP E N K+L+ FHTE 

Sbjct: 166 ALALFLRSKKGPCLWRQAPDYTLMDSARLHNKLRFEDNKVIFPSQEVSYNIKVLLWFHTE 225 

Query: 227 FIIDDKGNFIOTIDAEVITRNGIINGASFNYAFKNNTRHKELDVDPVK-LDPKFRNDMTR 285 

F +D G FRffi+DAEV+T GI+NGASFNY + RH +LDVDP+ DP+FR D + 
Sbjct: 226 FTLDSTGFFI^VDAEVVTEKGIWGASFNYG-TDGPRHWDLDVDPISHHDPQFRRDTLK 284 
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Query: 286 GYRSPNLSRRKWFFFKEEDYDCSYFNKKGYYAFGRRSAKQSVDKQVKYLKKAVQKMR 342 

G+RSP R+WF +++D+ SYFN KG +A+ +S+ V K K K+ + ++ 
Sbjct: 285 GFRSPKRVFRQWFRAQKDDFMFSYFNAKGLFAYHNKSSFARVKKSAKQFKRQIHPIK 341 

5 Based on this analysis, it was predicted that these proteins and their epitopes could he useful antigens for 
vaccines or diagnostics. 

Example 370 

A DNA sequence (GBSx0401) was identified in S.agalactiae <SEQ ID 1207> which encodes the amino 
acid sequence <SEQ ID 1208>. This protein is predicted to be similar to two-component response regulator 
10 [YcbM] (ompr-likeprotei). Analysis of this protein sequence reveals the following: 

Possible site: 31 

>>> Seems to have no N- terminal signal sequence 

Final Results 

15 bacterial cytoplasm Certainty=0. 3 12 9 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

20 >GP:CAA55264 GB:X78502 gtcR [Brevibacillus brevis] 

Identities = 99/228 (43%) , Positives = 149/228 (64%) , Gaps = 3/228 (1%) 

RTVLVVQGDDETIELLRSYLEGALYKVVMASDGEEAFSLFQQHQIDLAIIDITLPKIDGY 61 
+T+L+ + E IELL+ +LE Y+++ A DGE+A++ +QH +DLAIIDI +P +DG+ 
KTILIADDEPEIIELLKLFLERESYRIIEAYDGEQAMNY1RQHFVDLAIIDIMMPALDGF 62 

ELTRLIRQDSQIPIIMIAAKTTD^RILGIiNIGADDFITKPENSLEVIiARINSQLRRYYE 121 
+L + + + ++P+I+L+AK D D+ILGL +GADDFI+KPFN LE +ARI +QLRR +E 
QLIKRLTSEYKLPVIILSAKNRDSDKILGLGLGMDFISKPFNPLEAVARIOAQLRRAFE 122 

FNSLAKP- -KNQFIKIGELELDEEHVELTKNGKHIKLTATEFKILHILMS-SPGRIYTKT 178 
FN + Q+GLL ++++T E+++L+ M S I+TK 



QL+E+ D+ TIMV IS +RDKIED + P YIKT+RG+GYK 

QLFEQAWSETYWEDDNTIMVQISRLRDKIEDQPRQPVYIKTVRGLGYK 230 

There is also homology to SEQ ID 1 182: 

40 Identities = 87/230 (37%) , Positives = 144/230 (61%) , Gaps = 5/230 (2%) 

MRTVLWQGDDETIELLRSYIiEGALYKVVMASDGEEAFSLFQQHQIDLAI IDITLPKIDG 6 0 
M+ +L+V + ++++ L Y +V A DG EA ++F++ + DL I+D+ LP++DG 
MKKILI VDDEKP I SD I IKFNLTKEGYD I VTAFDGREAVT I FEEEKPDLI ILDLMLPELDG 60 

45 

YELTRLIRQDSQIPIIMIiAAKTTDMDRILGLNIGADDFITKPFNSLEVLARINSQLRRYY 120 

E+ + IR+ S +P1IML+AK ++ D+++GL IGADD++TKPF++ E+LAR+ + LRR 
LEVAKEIRKTSHVPIIMLSAKDSEFDKVIGLEIGADDYVTKPFSNRELLARVKAHLRRTE 120 

50 Query: 121 EFNSLAKPKN QFIKIGELELDEEHVELTKNGKHIKLTATEFKILHILMSSPGRIY 175 

+N Q + IG L++ + K+G+ ++LT EF++LH h + G++ 





Query: 


2 


25 


Sb j ct : 


3 




Query: 


62 




Sbjct: 


63 


30 








Query: 


122 




Sb j ct : 


123 


35 


Query: 


179 




Sb j ct : 


183 



Query: 


1 


Sb j ct : 


1 


Query: 


61 


Sb j ct : 


61 


Query: 


121 


Sbjct: 


121 


Query: 


176 


Sb j ct : 


181 



55 T+ L E + G GD T+ V + +R+KIED P+YI T RGVGY 



Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 371 

A DNA sequence (GBSx0402) was identified in S.agalactiae <SEQ ID 1209> which encodes the amino 
acid sequence <SEQ ID 1210>. This protein is predicted to be threonyl-tRNA synthetase 1 (thrS). Analysis 
of this protein sequence reveals the following: 

5 Possible site: 32 ' , 

>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2353 (Affirmative) < suco 

10 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB06860 GB:AP001517 threonyl-tRNA synthetase 1 [Bacillus halodurans] 
15 Identities = 413/638 (64%), Positives = 506/638 (78%), Gaps = 7/638 (1%) 

Query: 1 MIKITFPDGAIREFESGITTFEIAQSISNSLAKKAIAGKFNGQLIDTTRAIEEDGSIEIV 60 

MI ITFPDGA++EF G TT EIA SIS L KKAIAG +G L+D IE+DG+I IV 
Sbjct: 4 MINITFPDGAVKEFPKGTTTAEIAGSISPGLKKKALAGMLDGTLLDLNTPIEQDGTITIV 63 

20 

Query: 61 TPDHEDALGVLRHSAAHLFAQAAKRLFPD- -LCLGVGPAIQDGFYYDTDNKSGQISNDDL 118 

TP+ ++AL VLRHS AH+ AQA KRLF D + LGVGP 1+ GFYYD D ++ +DL 

Sbjct: 64 TPESDEALEVLRHSTAHVMAQALKRLFKDRNVKLGVGPVIEGGFYYDVDMDES-LTPEDL 122 

25 Query: 119 PRIEEEMKKIVKENHPCIREEISKEEALELFKD- - DPYKVELI SEHAEDG - LTVYRQGEF 175 

P+IE+EMKKI+ EN P R + S+EEAL +++ DPYK+ELI++ ED +T+Y QGEF 
Sbjct: 123 PKIEKEMKKIIGENLPIERVWSREEALaRYEEVGDPYKIELINDLPEDETITIYEQGEF 182 

Query: 176 VDLCRGPRVPSTGRIQVFHLLNVAGAYWRGNSDNAMMQRVYGTAWFDKKDLKAYLKRREE 235 
30 DLCRG HVPSTG+++ F LLN+AGAYWRG+S N M+QR+YGTA+F K DL> +L+ EE 

Sbjct: 183 FDLCRGVHVPSTGKLKEFKLLNIiAGAYWRGDSSNKMLQRIYGTAFFKKADLDEHLRLLEE 242 

Query: 236 AKERDHRKLGKELDLFMVNPEVGQGLPFWLPNGATIRRELERYIVDKEIASGYQHVYTPP 295 
AKERDHRKLGKEL +F ++ +VGQGLP WLP GATIRR +ERYIVDKE GYQHVYTP 
35 Sbjct: 243 AKERDHRKLGKELGIFALSQKVGQGLPLWLPKGATIRRIIERYIVDKEEKLGYQHVYTPV 302 

Query: 296 ^SVEFYKTSGHWDHYREDMFPTMDMGDGEEFVLRPMNCPHHIEVYKHHVHSYRELPIRI 355 

+AS E YKTSGHWDHY++DMFPTM+M + EE VLRPMNCPHH+ VYK + SYR+LP+RI 
Sbjct: 303 IASSELYKTSGHWDHYKDDMFPTMEM-ENEELVLRPMNCPHHMMVYKTEMRSYRQLPLRI 361 

40 

Query: 356 AELGMMHRYEKSGALTGLQRTOEMTLNDAHIBVTPEQIKDEFLKAIJILIAEIYEDFNLTD 415 

AELG+MHRYE SGA+ +GLQRVR MTIiNDAHIF P+QIKDEF++ + LI +YEDF L + 
Sbjct: 362 AELGLMHRYEMSGAVSGLQRVRGMTLNDAH IFCRPDQI KDEFVRWRL I QAVYEDFGLKN 421 

45 Query: 416 YRFRLSYRDPEDKHKYYDNDEMWENAQAMLKEAMDDFGLDYFEAEGEAAFYGPKLDIQVK 475 

Y FRLSYRDPEDK KY+D+D MW AQ MLKEAMD+ L+YFEAEGEAAFYGPKLD+QV+ 
Sbjct: 422 YSFRLSYRDPEDKEKYFDDDNMVWKAQGMLKEAMDELELEYFEAEGEAAFYGPKLDVQVR 481 

Query: 476 TALGNEETLSTIQLDFLLPERFDLKYIGADGEEHRPIMIHRGGISTMERFTAILIETYKG 535 
50 TALG +ETLST+QLDFLLPERFDL Y+G DG+ HRP+++HRG +STMERF A L+E YKG 

Sbjct: 482 TALGKDETLSTVQLDFLLPERFDLTYVGEDGQPHRPWVHRGWSTMERFVAFLLEEYKG 541 

Query: 536 AFPTWLAPQQVSVIPISNEAHIDYAWEVARVLKDRGIRAEVDDRNEKMQYKIRAAQTQKI 595 
AFPTWLAP QV VIP+S EAH++YA V L+ GIR E+D+R+EK+ YKIR AQ QKI 
55 Sbjct: 542 AFPTWIAPVQVQVIPVSPEAHLEYAKNVQETLQQAGIRVEIDERDEKIGYKIREAQMQKI 601 

Query: 596 PYQLIVGDKEMEEKAVNVRRYGSKATETKSIEEFVESI 633 

PY L++GDKE+E VNVR+YG K + + ++EFV + 
Sbjct: 602 PYMLVLGDKEVEANGVNVRKYGEKDSSSMGLDEFVRHV 639 

60 

A related DNA sequence was identified in S.pyogenes <SEQ ID 121 1> which encodes the amino acid 
sequence <SEQ ID 1212>. Analysis of this protein sequence reveals the following: 
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Possible site: 32 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2566 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 564/644 (87%) , Positives = 608/644 (93%) 



Query: 


1 


MIKITFPDGAIREFESGITTFEIAQSISNSIAKKAIAGKFNGQLIDTTRAIEEDGSIEIV 


60 






MIKITFPDGA+REFESG+TTF+IA+SIS SLAKKALAGKFN QLIDTTRAIEEDGSIEIV 




St)j ct : 


± 




O u 


Query: 


61 


TPDHEDALGVLRHSAAHLFAQAAKRLFPDLCLGVGPAIQDGFYYDTDNKSGQISNDDLPR 


120 






TPDH+DA VLRHS AAHLFAQAAKRLFP +L LGVGPAI +GFYYDTDN GQISN+DLPR 




QK-! c-i%- • 
OJJJ <JU . 


0 X 


X 1rJJrlr1JJ.rt.iH V ijKlloiirtrllJr AyfiH.lSJ\.J_ir rU\xjnijovor^X.rtxlior X XJJ XxXLNrtEilafJ^XOlNJ^XJXJJrJK. 




Query: 


121 


IEEEMKKIVKENHPCIREEISKEEALELFKDDPYKVELISEHAEDGLTVYRQGEFVDLCR 


180 






IE EM+KIV EN+PCIREE++KEEALELFKDDPYKVELI+EHA GLTVYRQGEFVDLCR 




QV-i-i j-tH • 
oJJJ Ct . 




TT7APMHTT TAT^PMVPPTPPPXTTTnTPZYT T?T.17V'nT^PVXr^ri7T.TT\TT?Ua(^Af^T.T\7VPni^TrTi , \7nT.PP 


iou 


Query: 


181 


GPHVPSTGRIQVFHLLNVAGAyWRGNSDNAMMQRVyGTAWFDKKDLKAYLKRREEAKERD 


240 






GPHVPSTGRIQVFHLLNVAGAYWRGNSDN MMQR+YGTAWFDKKDLKAYL R EEAKERD 




qV,n ,-,4- . 




nPPTWPQTPPTnVPUT.T.TCrt/A^AVWPniVTQ TVfTT AWPPi TCTCHI .Tf AYT.TPT.PPATCPPn 


240 


Query: 


241 


HRKLGKELDLFMVNPEVGQGL^ 


300 






HRKLGKELDLFM++ EVGQGLPFWLP+GATIRR LERYI D KE +ASG YQHVYTP P+AS VE 




Sfo j ct • 


241 


WPKT rcKPT.nT/FMTQnPVnnnT.PPT/^ 

rir\.ivi-ivj-t\ PiliiJijr l v ix oyn vvjyjxjir r vv ju iruoiri i J. r\.i\. x XjHiJX u.i LJivcjiji-ioo x v x x Jr irxirto v Hj 


300 


Query: 


301 


FYKTSGHWDHYREDMFPTMDMGDGEEFVLRPMNCPHHIEVYKHHVHSYRELPIRIAELGM 


360 






YKTSGHWDHY+EDMFP MDMGDGEEFVLRPMNCPHHI +VYK+HV SYRELPIRIAELGM 




oDj ct : 


"3 m 
-5U± 


±ji rS.x o^nWlJnxy&JJlVir ¥ VrlUNoJJkaQiir V jjKrT'lWV-Jrxirixy V XJ\xNrlvrCoXKiliJJrXJXXi^xjVjl ¥ J 


jDU 


yu,tsx.y . 


361 




420 






MHRYEKSGAL+GLQRVREMTLND HIFVTPEQI++EF +AL LI ++Y DFNLTDYRFRL 




Sbjct: 


361 


MHRYEKSGALSGLQRWEMTLNDGHIFVTPEQIQEEFQRALQLIIDVYADFNLTDYRFRL 


420 


Query: 


421 


SYRDPEDKHKYYDNDEMWENAQAMLKEAMDDFGLDYFEAEGEAAFYGPKLDIQVKTALGN 


480 






SYRDP D HKYYDNDEMWENAQ+MLK A+D+ G+DYFEAEGEAAFYGPKLDIQVKTALGN 




Sb j ct : 


421 


SYRDPNDTHKYYDNDEMWENAQSMLKAALDEMGVDYFEAEGEAAFYGPKLDIQVKTALGN 


480 


Query: 


481 


EETLSTIQLDFLLPERFDLKYIGADGEEHRPIMIHRGGISTMERFTAILIETYKGAFPTW 


540 






EETLSTIQLDFLLPERFDLKYIGADGEEHRP+MIHRG ISTMERFTAILIETYKGAFPTW 




Sb j ct : 


481 


EETLSTIQLDFLLPERFDLKYIGADGEEHRPVMIHRGVISTMERFTAILIETYKGAFPTW 


540 


Query: 


541 


lapqqVsvipisneahidyawevarvlkdrgiraevddrnekmqykiraaqtqkipyqli 


600 






LAP QV+VIPISNEAHIDYAWEVA+ L+DRG+RA+VDDRNEKMQYKIRA+QT KIPYQLI 




Sbjct: 


541 


LAPHQVTVIPISNEAHIDYAWEVAKTLRDRGVRADVDDRNEKMQYKIRASQTSKIPYQLI 


600 


Query: 


601 


VGDKEMEEKAVNVRRYGSKATETKS IEEFVESILADIARKSRPD 644 








VGDKEME+K+VNVRRYGSK T T+S+EEFVE+ILADIARKSRPD 




Sbjct: 


601 


VGDKEMEDKSVNVRRYGSKTTHTESVEEFVENILADIARKSRPD 644 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 372 

A DNA sequence (GBSx0403) was identified in S.agalactiae <SEQ ID 1213> which encodes the amino 
acid sequence <SEQ ID 1214>. Analysis of this protein sequence reveals the following: 

Possible site: 16 

»> Seems to have no N-terminal signal sequence 
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Final Results 



bacterial cytoplasm 
bacterial membrane 
bacterial outside 



Certainty=0. 1985 (Affirmative) < suco 
Certainty=0. 0000 (Not Clear) < suco 
Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAA72250 GB:Y11463 0RF5 [Streptococcus pneumoniae] 
Identities = 189/290 (65%) , Positives = 234/290 (80%) 

Query: 1 MRIGLFTDTYFPQVSGVSTSIRTLKEGLEKEGHKVYIFTTTDRNVKRFEDPTIIRLPSVP 60 

MRIGLFTDTYFPQ VSG V+TS IRTLK LEK+GH V+IFTTTD++V R+ED IIR+PSVP 
Sbjct: 1 MRIGLFTDTYFPQVSGVATSIRTLKTELEKQGHAVFIFTTTDKDVNRYEDWQIIRIPSVP 60 

Query: 61 FISFTDRRVVYRGLISAYRIAKDYELDIIHTQTEFSLGLLGKLVAKALRIPVVHTYHTQY 120 

F +F DRR YRG A IAK Y+LDIIHTQTEFSLGLLG +A+ L+ 1 PV+HTYHTQY 
Sbjct: 61 FFAFKDRRFAYRGFSKALEIAKQYQLDI IHTQTEFSLGLLGIWIARELKI PVIHTYHTQY 120 

Query: 121 EDYVGYIAKGKLI KPSMVKYIMRTYLSDLDGVICPSRIVIMLLDGYGVKI PKQVIPTGI P 180 

EDYV YIAKG LI+PSMVKY++R +L D+DGVICPS IV +LL Y VK+ K+VIPTGI 
Sbjct: 121 EDYVHYIAKGMLIRPSWKYLWGFLHDVDGVICPSEIVRDLLSDYKVKVEKRVIPTGIE 180 

Query: 181 VENYRREDISEETIKNLRTELGLADNDTMLLSLSRVSFEKNIQAILMHLSAVVDENPHVK 240 

+ + R +1 +E +K LR++LG+ D + LLSLSR+S+EKNIQA+L+ + V+ E VK 
Sbjct: 181 LAKFERPEIKQENLKELRSKLGIQDGEKTLLSLSRISYEKNIQAVLVAFADVLKEEDKVK 240 

Query: 241 LVIVGDGPYLSDLKELVHSLELENSVIFTGMVEHSQVAIYYKACDFFISA 290 

LV+ GDGPYL+DLKE +LE+++SVIFTGM+ S+ A+YYKA DFFISA 
Sbjct: 241 LWAGDGPYLNDLKEQAQNLEIQDSVIFTGMIAPSETALYYKAADFFISA 290 

A related DNA sequence was identified in S.pyogenes <SEQ ID 121 5> which encodes the amino 
sequence <SEQ ID 121 6>. Analysis of this protein sequence reveals the following: 

Possible site: 17 

>» Seems to have no N-terminal signal sequence 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 309/444 (69%) , Positives = 370/444 (82%) 

Query: 1 MRIGLFTDTYFPQVSGVSTSIRTLKEGLEKEGHEVYIFTTTDRNVKRFEDPTIIRLPSVP 60 

MRIGLFTDTYFPQVSGV+TSIRTLKE LEKEGHEVYIFTTTDR+VKRFEDPTIIRLPSVP 
Sbjct: 1 MRIGLFTDTYFPQVSGVATSIRTLKEELEKEGHEVYIFTTTDRDVKRFEDPTIIRLPSVP 60 

Query: 61 FISFTDRRVVYRGLISAYRIAKDYELDIIHTQTEFSLGLLGKLVAKALRIPVVHTYHTQY 120 

F+SFTDRRWYRGLIS+Y+IAK Y LDIIHTQTEFSLGLLGK++ KALRI PWHTYHTQY 
Sbjct: 61 FVSFTDRRVVYRGLISSYKIAKHYNLDIIHTQTEFSLGLLGKMIGKALRIPVVHTYHTQY 120 

Query: 121 EDYVGYIAKGKLIKPSMVKYIMRTYLSDLDGVICPSRIVLNLLDGYGVKIPKQVIPTGIP 180 

EDYV YIA GK+I+PSMVK ++R YL DLDGVICPSRIVkNLL+GY V IPK+VIPTGIP 
Sbjct: 121 EDYVS YIANGKI I RPSMVKPLLRGYLKDLDGVICPSRIVEjNLLEGYEvTI PKRVI PTGI P 180 

Query: 181 VENYRREDISEETIKNLRTELGLADNDT^LSLSRVSFEKNIQAILMHLSAVVDENPHVK 240 

+E Y R+DI+ E + NL+ ELG+A ++TMLLSLSR+S+EKNIQAI+ + A++ EN +K 
Sbjct: 181 LEKYIRDDITAEEVTNLKAELGIAGDETMLLSLSRISYEKNIQAIINQMPAIIAENAKIK 240 

Query: 241 LVIVGDGPYLSDLKELVHSLELENSVIFTGMVEHSQVAIYYKACDFFISASTSETQGLTY 300 

L+IVG+GPYL DLK L LE++ V FTGMV H +VA+YYKACDFFISASTSETQGLTY 
Sbjct: 241 LIIVGNGPYLQDLKHIAMQLEVDKHVTFTGWPHDKVALYYKACDFFISASTSETQGLTY 300 



Final Results 



bacterial cytoplasm Certainty=0 . 1074 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



Query: 301 IESLASGRPIIAQSNPYLDDVISDKMFGTLYKKESDLADAILDAIAETPKMTQEAYEQKL 360 
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IESLASG PIIA NPYLDDV++DKMFGTLY E+DL DAI+DAI +TP M + +K 
Sbjct: 301 IESIASGTPI IAHGNPYLDD VOTDKMFGTLYYAETDLTDAI IDAILKTPVMDKRLLAKKR 360 

Query: 361 YEISAENFSKSVYAFYLDFLISQKASVKEKVSLTIGNKDSHSTLRFVRKA.VYLPKKVFTF 420 

YEISA++F KS+Y FYLD LI++ + +K+SL + + S+L+ V+ A++LPK+ 
Sbjct: 361 YEISAQHFGKSIYTFYLDTLIARNSKEAQKLSLYLIfflSGKBSSIjltaiiVQGAIHLPKEAAKV' 420 

Query: 421 TGRASKKWKAPKRRISSIRDFLD 444 

T S KWKAP + + +I+DFLD 
Sbjct: 421 TAITSVKWKAPIKLVHAIKDFLD 444 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 373 

A DNA sequence (GBSx0404) was identified in S.agalactiae <SEQ ID 1217> which encodes the amino 
acid sequence <SEQ ID 121 8>. This protein is predicted to be lipopolysaccharide biosynthesis protein- 
related protein. Analysis of this protein sequence reveals the following: 

Possible site: 61 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4076 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAG19110 GB:AE005009 Vng0600c [Halobacterium sp. NRC-1] 
Identities = 117/350 (33%) , Positives = 178/350 (50%) , Gaps = 29/350 (8%) 



Query: 


1 


MKVLLYLEAEEYLKKSGIGRAIKHQEKALQIASIDYTTNPT 


41 






M+ L YLEA E L+ G+ A Q AL+ ++ P 




Sb j ct : 


2 


MRALNYLEAAEALR~GG^WTATNQQRAALETTDvEVvETPWRAGDPVRSIGSLAAGGSCF 


60 


Query: 


42 


DDFDLVHMNTYGIRSWLLMSKAKKTGKKVIMHGHSTEEDFRNSFIGSNLVSPLFKWYLCR 


101 






FD+ H N G S + A++T +++H H T EDF SF GS+ ++P + YL 




Sb j ct : 


61 


TAFDVAHCNLVGPGSVAVARHARRTDTPLVLHAHLTREDFAQSFRGSSTIAPALEPYLRW 


120 


Query: 


102 


FYQKADAIITPTDYSKQLIKAYGIKKPIFVLSNGIDLSRYQRSEKKESAFRHYFHLSKDD 


161 






FY +AD ++ P++Y+K +++AY + PI LSNG+DL Q E + R F L D 




Sb j ct : 


121 


FYSQADLVLCPSEYTKDVLRAYPVDAPIRQLSNGVDLESMQGYESFRADTRARFDL--DG 


178 


Query: 


162 


KVVMGAGLYFMRKGIDQFvEVAAKMPDIRFIWFGETNKWVIPRKVRQIVTKQHPSNVTFA 


221 






W G F RKG+ F E+ AK D F WFG ++ + P+NVTF 




Sb j ct : 


179 


TVVYAVGEVFERKGLTMFCEL-AKATDHEFAWFGPYDEGPQAGAATRKWVADPPANVTFT 


237 


Query: 


222 


GYIKGDVYEGiAMSASDAFFFPSREETEGIVVLEALASHQHVVLRDIPVYHGWVTE-DSVE 


280 






GY++ A A D + FP++ E +GI VLEA+A + WLRDIPV+ + T+ + 




Sb j ct : 


238 


GYMEDK- -RAAFGAGDI YLFPAKVENQGIAVLEAMACGKPWLRDI PVFREFFTDGEDCL 


295 


Query: 


281 


LATDVDGFVEKLDKVLSGKSDKIKEGYH VAESRSIERIAHELASVYQ 327 








+ + + F + +D++ + + G + AES S++RI ELAS+Y+ 




Sb j ct : 


296 


MCSTFEAFRDAIDRLADDPELRTRLGENARETAESHSLDRIGEELASIYE 345 





A related DNA sequence was identified in S.pyogenes <SEQ ID 1219> which encodes the amino acid 
sequence <SEQ ID 1220>. Analysis of this protein sequence reveals the following: 

Possible site: 61 

>» Seems to have no N-terminal signal sequence 



Final Results 
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bacterial cytoplasm 
bacterial membrane 
bacterial outside 



Certainty=0.4088 

Certainty=0 . 0000 

Certainty= 0.0000 



(Affirmative) < suco 
(Not Clear) < suco 
(Not Clear) < suco 



5 An alignment of the GAS and GBS proteins is shown below: 

Identities = 236/332 (71%) , Positives = 276/332 (83%) 





Query: 


1 


MKVLLYLKAEEYLKKSGIGRAIKHQEKALQIAGIDYTTNPTDDFDLVHMNTYGIRSWLLM 


60 








MKVLLYLEAE YL+KSGIGRAIKHQ KAL + G +TTNP + +DLVH+NTYG++SWLLM 




10 


Sb j ct : 


1 


MTT^LYLEaENYLRKSGIGRAIKHOAKALSLVGnHFTTNPRETYDLuHIJsITYGLKSWLLM 


60 




Query: 


61 


SKAKKTGKKVIMHGHSTEEDFRNSFIGSNLVSPLFKWYLCRFYQKADAIITPTDYSKQLI 


120 








KA+K GKKVIMHGHSTEEDFRNSFI SNL+SP FK YLC FY KADAIITPT YSK LI 




15 


Sb j Gt : 


61 


IKROK'ftnKKVIMHRHSTEEDFRNSFIFSNLLSPWFKKYLCHFYNKADAIITPTLYSKSLI 


120 




Query : 


121 


KAYGIKKPIFVI1SNGIDLSRYQRSEKKESAFRHYFHLSKDDKVVMGAGI1YFMRKGIDQFV 


180 








++YG+K PIF +SNGIDL +Y KKE+AFR YF + + +KWMGAGL+F+RKGID FV 






Sbjct: 


121 


ESYGVKSPIFAVSNGIDLEQYGADPKKEAAFRRYFDIKEGEKVVMGAGLFFLRKGIDDFV 


180 


20 


Query: 


181 


EVAAKMPDIRFIWFGETNKWVIPRKVRQIVTKQHPSNVTFAGYIKGDVYEGAMSASDAFF 


240 








+VA MPD+RFIWFGETNKWVIP +VRQ+V HP N+ F GYIKGDVYEGAM+ +DAFF 






Sb j ct : 


181 


KVAQAMPDWFIWFGETNKWVIPAQWQMVNGNHPKNLIFPGYIKGDVYEGAMTGADAFF 


240 




Query: 


241 


FPSREETEGIVVljEALASHQHVVLRDIPWHGWVTEDSVEIATDVDGFVEKLDKVLSGKS 


300 


25 






FPSREETEGIWLEALAS QH+VLRDIPVY+GWV + S ELATD+ GF+E L KV SG S 






Sb j ct : 


241 


FPSREETEGI VVLEAIASRQHLVLRDIPVYYGWVDQSSAELATDIPGFIEALKKVFSGAS 


300 




Query: 


301 


DKIKEGYHVAESRSIERIAHEIASVYQKVMEL 332 










+K++ GY VA+SR +E + H h VY+KVMEL 




30 


Sb j ct : 


301 


NKVEAGYKWAQSRRLETVGHALVDVYKKVMEIi 332 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 374 

35 A DNA sequence (GBSx0405) was identified in S.agalactiae <SEQ ID 1221> which encodes the amino 
acid sequence <SEQ ID 1222>. Analysis of this protein sequence reveals the following: 
Possible site: 16 

>» Seems to have no N- terminal signal sequence 

40 Final Results 

bacterial cytoplasm Certainty=0 . 5487 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 



45 The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC35010 GB:AF055987 intracellular a-amylase [Streptococcus mutans] 
Identities = 308/483 (63%) , Positives = 378/483 (77%) 

Query: 1 MTNELIMQAFEWYLPSDGNHWKKLEESISDLKKLGISKIWLPPAFKGTSSDDVGYGVYDL 60 
50 MTNE +MQ FEWYLP+DG HW+ L E S LK +GISK+W+PPAFKGT S+DVGYGVYDL 

Sbjct: 1 MTNETWQYFEWYLPNDGKHWQHLAEDASHLKNIGISKVWMPPAFKGTGSNDVGYGVYDL 60 

Query: 61 FDLGEFDQNGTIRTKYGRKEEYLKLIKSLKANGIKPFADIVLNHKANGDHKEKFQVIKVN 120 
+DLGEF+QNGT+RTKYG +E+YL + +LK I P +DIVLNHKANGD KE+FQV+KVN 
55 Sbjct: 61 YDLGEFNQNGTWTKYGSREDYIjNAvNALKEQEIMPISDIVIiNHKANGDAKERFQvvKVN 120 



Query: 121 PENRQE^SEPYEIEGWTGFDFPGRC^EYNDFKTJHWYHFTGLDYDAKNNETDIFMIVGDN 180 

P NRQE +SEPYEIEGWT F+FPGRQ Y+DFKWHWYHFTG+DYDA +NE I+MI+GDN 
Sbjct- 121 PSNRQEKISEPYEIEGWTQFNFPGRQDNYSDFKWHWYHFTGVDYDALHNENGIYMILGDN 180 

60 

Query: 181 KGWADDDLIDDENGNFDYLMYNDIDFKHPEVIKNLQDWAKWFIETTGIEGFRLDAVKHID 240 
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KGWA + ID ENGN+DYLMY+DIDFKHPEV ++L+DW WF+ET+G+ GFRLDA+KHID 
Sbjct: 181 KGWASQENIDQENGNYDYLMYDDIDFKHPEVQEHLRDWVAWFLETSGVGGFRLDAIKHID 240 

Query: 241 SYFIQTFINDIRTKIKPDLEVFGEYWKSDQTSMKDVLEATQFQFSLVDVTLHMNFFDASH 300 
5 F+ FI IR +K DI> VFGEYWK + DYL + QF L+DV LHM+ F+A 

Sbjct: 241 KTFMAQFIRYIREHLKADLYVFGEYWKDSHFDITDYLHSVDLQFDLIDVMLHMSLFEAGQ 300 

Query: 301 QNRDFDMRTIFDDSLVIDNPEYAVTFVENHDTQSGQALESRVEDWFKPLAYGLILLRQQG 360 
+ DFD+ TI DDSL+ +P++AVTFV+NHD+Q GQALES V +WFKPLAYGLILLRQ+G 
10 Sbjct: 301 KGSDFDLSTILDDSLMKSHPDFAVTFVDNHDSQRGQALESTVAEWFKPLAYGLILLRQEG 360 

Query: 361 TPCLFYGDYYGIQGEFGQPSFKEVIDKMAELRQNYVFGKQVDYFTHSNCIGWTCLGDEEH 420 

PC+FYGDYYGI GEF Q SF+ V+DK+ +RQ +V+G + T NCIGWTCLGDEEH 
Sbjct: 361 IPCVFYGDYYGISGEFAQESFQTVLDKLLYIRQYHVYGSKKIILTMPNCIGWTCLGDEEH 420 

15 

Query: 421 NSCLAWLTNGDQGWKHMEVGEIYAGKTFVDYLGNCEQEWIGDDGWGDFLVESASISAW 480 

+AV+++NG+ K M +GE . K FVDYL NC +EV++ D GWGDF V+ AS+SAW 
Sbjct: 421 PDGVAVIlSNGEANCKR^lNMGEFNRNKVFVDYLNNCTEEV'ILDDQGWGDFPVQEASIlSAW 480 

20 Query: 481 VPK 483 

V K 

Sbjct: 481 VNK 483 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1223> which encodes the amino acid 
25 sequence <SEQ ID 1224>. Analysis of this protein sequence reveals the following: ' 

Possible site: 30 

>>> Seems to have a cleavable N-term signal seq. 

30 Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

35 The protein has homology with the following sequences in the databases: 

>GP:AAB00845 GB.-M57692 alpha-cyclodextrin glycosyltransf erase 
[Thermoanaerobacterium thermosulfurigenes] 
Identities = 356/710 (50%) , Positives = 468/710 (65%) , Gaps = 16/710 (2%) 

40 Query: 7 KTYKLLTKSAVLLGLISFPLT- -VSAADNASVTNKADFSTDTIYQIVTDRFNDGNTSNNG 64 

KT+KL+ + L h+ F LT + AA + +V+N ++STD IYQIVTDRF DGNTSNN 
Sbjct: 3 KTFKLILV1MLSLTLV-FGLTAPIQAASDTAVSNVVNYSTDVIYQIVTDRFVDGNTSNNP 61 

Query: 65 KTDVFDKN- -DLKKYHGGDWQGIIAKIKDGYLTDMGISAIWISSPVENIDSIDPSN G 119 

45 D++D LKKY GGDWQGI I KI DGYLT MG++AIWIS PVENI ++ P + G 

Sbjct: 62 TGDLYDPTHTSLKKYFG<3DWQGIINKINDGYLTGMGvTAIWISQPVENIYAVLPDSTFGG 121 

Query: 120 SAAYHGYWAKDFFKTNQHFGTEADFQQLVKVAHQHHIKWIDFAPNHTSTAEKEGTTFKE 179 
S +YHGYWA+DF +TN +FG+ DFQ L+ AH H+IKV+IDFAPNHTS A + T+ E 
50 Sbjct: 122 STSYHGYWARDFKRTNPYFGSFTDFQNLINTAHAHNIKVIIDFAPNHTSPASETDPTYAE 181 

Query: 180 DGALYKNGKLVGKFSDDKDKIFNHESWTDFSTYFjNSIYHS^GLADLNNINPKTOQYMKE 239 

+G LY NG L+G +++D + F+H TDFS+YE+ IY +++ LADLN N +D Y+K 
Sbjct: 182 NGRLYDNGTLLGGYTNDTNGYFHHYGGTDFSSYEDGIYRNLFDLADLNQQNSTIDSYLKS 241 

55 

Query: 240 AIDKWLDLGVDGIRVDAVK^SCGWQKNWLSHIYEKHNVFVFGEWFSGHTDDDYDMTTFA 299 

AI WLD+G+DGIR+DAVKHM GWQKN++ I VF FGEWF G + D + T FA 

Sbjct: 242 AIKVV^DMGIDGIRLDAWHMPFGWQKNFra)SILSYRPVFTFGEWFLGTNEIDVNNTYFA 301 

60 Query: 300 NNSGMGLLDFRFANAIRQLYTGFSTFTMRDFYKVLENRDQVTNEVTDQvTFIDNHDMERF 359 

N SGM LLDFRF+ +RQ++ +T TM ++++ N + D VTFIDNHDM+RF 

Sbjct: 302 NESGMSLLDFRFSQKVRQVFRD-NTDIWYGLDSMIQSTASDYNFINDMVTFIDNHDMDRF 360 



65 



Query: 360 ATKVANNQTAVNQAYALLLTSRGVPNIYYGTEQYATGDKDPNNRGDMPSFNKESQAYKVI 419 
+ V QA A LTSRGVP IYYGTEQY TG+ DP NR M SFN + AY VI 
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Sbjct: 361 YN--GGSTRPVEQALAFTLTSRGVPAIYYGTEQYMTGNGDPYNRAMMTSFNTSTTAYNVI 418 

Query: 420 SKIAPLRKQNQAIAYGTTEQRWISDHVLWERKFGNHVALVAIITODQTNGYTITNAKTAL 479 

KLAPLRK N A+AYGTT+QRWI++ V ++ERKFGN+VALVAINR+ + Y IT TAL 
Sbjct: 419 KKLAPLRKSNPAIAYGTTQQRWIMCDVYIYERKFGNNVALVAINRlttSTSYN 478 

Query: 480 PQNSYKDKLEGLLGGQELIVGADGTISSFELGAGQVAVWTYEGEDKTPQLGDVDASVGIA 539 

P +Y D L GLL G + V +DG+++ F L AG+VAVW Y +P +G V ++ A 

Sbjct: 479 PAGTYTDVLGGLLNGNSISVASDGSVTPFTLSAGEVAVWQYVSSSNSPLIGHVGPTMTKA 538 

Query: 540 GNKITISGQGFGNSKGQVTFGEISAEILSWSDTLITLKVPTVPANYYNISVTTADKQTSN 599 

G ITI G+GFG + GQV FG + I+SW DT + +KVP+V YNIS+ T+ TSN 
Sbjct: 539 GQTITIDGRGFGTTSGQVLFGSTAGTIVSWDDTEVKVKVPSVTPGKYNISLKTSSGATSN 598 

15 Query: 600 SYQAFEVLTDKQIPTOLLINDFKTVPGEQLYmGDVFEMGANDAKNAVGPLFNNTQTlAK 659 

+Y +LT QI VR ++N+ TV GE +YL G+V E+G D A+GP+FN Q + + 
Sbjct: 599 TYMNINILTGNQICTOFVVNNASTVYGENVYLTGNVAELGNWDTSKAIGPMFN- - QWYQ 656 

Query: 660 YPNWFFDTHLPINKEIAVKLVKKDSIGMVLWT--SPETYSIKTGHEAQTI 707 
20 YP W++D +P IK +KK+ + W S TY++ + I 

Sbjct: 657 YPTWYYDVSVPAGTTIQFKFIKKNG-NTITWEGGSNHTYTVPSSSTGTVI 705 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 112/509 (22%) , Positives = 193/509 (37%) , Gaps = 103/509 (20%) 

25 

Query: 18 GNHWKKLEESISD--LKKLGISKIWLPPAFKGTSSDDV GYGVYDLFDLGEFD 67 

G W+ + ID L +GIS IW+ + S D GY D F + 

Sbjct: 79 GGDWQGIIAKIKDGYLTDMGISAIWISSPVENIDSIDPSNGSAAYHGYWAKDFFKTNQH- 137 

30 Query: 68 QNGTIRTKYGRKEEYLKLIKSLKANGIKPFADIVI^ 127 

+G + ++ +L+K + IK D NH + + + 
Sbjct: 138 FGTEADFQQLVKVAHQHHIKWIDFAPNHTSTAEKE 173 

Query: 128 LSEPYEIEGWTGFDFPGRQGEYNDFKOTIWYHFTGLDYDAKNNETDIFMIVGDNKGWADDD 187 
35 G F Y + K G D K+ + +++ W D 

Sbjct: 174 GTTFKEDGALYKNGK LVGKFSDDKDK IFNHESWTDFS 210 

Query: 188 LIDDE- -NGNFDYLMYNDIDFKHPEVIKNLQDWAKWFIETTGIEGFRLDAVKHIDSYFIQ 245 
++ + + N+I+ K + +K D KW G++G R+DAVKH+ + + 

40 • Sbjct: 211 TYENSIYHSMYGLiADLNNINPKVDQYMKEAID- -KWL- -DLGVDGIRVDAVKHMSQGWQK 266 

Query: 246 TFINDIRTKIKPDLEVFGEYWKSDQTSMKDYLEATQFQFSLVDVTLHMNFFDASHQ-NRD 304 

+++ I K ++ VFGE WST D+TF+ L F+AQ 

Sbjct: 267 NWLSHIYE- -KHNVFVFGE-WFSGHTD- -DDYDMTTFANNSGMGLLDFRFANAIRQLYTG 321 

45 

Query: 305 FDMRTIFDDSLVIDNPEYA VTFVENHDTQSGQALESRVEDWFKPLAYGLILLR 357 

F T+ D V++N + VTF++NHD + + + AY L LL 

Sbjct: 322 FSTFTMRDFYKVLENRDQVTNEVTDQVTF I DNHDMERFATKVANNQTAVNQ - AYAL - LLT 379 

50 Query: 358 QQGTPCLFYGDYYGIQGE FGQPSFK EVIDKMAELR QNYVFGKQVD 402 

+G P ++YG G+ PSF +VI K+A LR Q +G 

Sbjct: 380 SRGVPNIYYGTEQYATGDKDPN^GDMPSFNKESQAYKVISKLAPLRKQNQALAYGTTEQ 439 

Query: 403 YFTHSNCIGWTCLGDEEHNSCLAWLTNGDQ- -GWKHMEVGEIYAGKTFVDYLGNC- -EQ 458 
55 ++++ + ++ +A+V N DQ G+ ++ D L Q 

Sbjct: 440 RWISDHVL VFERKFGNHVALVAINRDQTMGYTITNAKTALPQNSYKDKLEGijLGGQ 495 

Query: 459 EWIGDDGW-GDFLVESASISAWVPKIEE 486 
E+++G DG F + + ++ W + E+ 
60 Sbjct: 496 ELIVGADGTISSFELGAGQVAVWTYEGED 524 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 375 

A DNA sequence (GBSx0406) was identified in S.agalactiae <SEQ ID 1225> which encodes the amino 
acid sequence <SEQ ID 1226>. This protein is predicted to be catabolite control protein A. Analysis of this 
protein sequence reveals the following: 

5 Possible site: 29 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2154 (Affirmative) < suco 

10 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Mot Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9707> which encodes amino acid sequence <SEQ ID 9708> 
was also identified. 

15 The protein has homology with the following sequences in the GENPEPT database: 

>GP:EAA88121 GB:AB028599 catabolite control protein A [Streptococcus 
bovis] (ver 3) 
Identities = 304/332 (91%) , Positives = 320/332 (95%) 

20 Query: 1 ^m'DDTITIYDVAREAGVS^mTVSRVVNGNKOTKENTRKKVLEVIDRLDYRPNAVARGLA 60 

MKrrDDTITIYDVAREAGVSMAWSRvWGNKNVKEOTRKKVLEVIDRLDYRP 
Sbjct: 1 MNTDDT I T I YDVAREAGVSMATOSRVVNGNKNVTCENTRKKVLEVIDRLDYRPNAVARG1A 60 

Query: 61 SKKTTTVGWIPNIANSYFSILARGIDDIAAMYKYNIVlASSDEDDDKEVNvVNTLFAKQ 120 
25 SKKTTTVGWIPNIANSYFSII^+GIDDIAAMYKTOIVX^SDEDDDKEVNvVNTLFAKQ 

Sbjct: 61 SKKTTWGWIPNIANSYFSILAKGIDDIARMYKYNIVLASSDEDDDKEVNVVNTLFAKQ 120 

Query: 121 VDGI 1 FMGHHLTEKIRAEFSRSRTPI VIiftGTVDLEHQLPS VNIDYKAAAVDVIDILAGNH 180 
VTXSIIFMGHHLTEKIRAEFSRSRTP+VIjAGTVDLEHQLPSVNIDYKAA. DV+DILA N+ 
30 Sbjct: 121 VDGIIFMGHHLTEKIRAEFSRSRTPVVIAGTVDLEHQLPSWIDYKAAVADVVDILAKNN 180 

Query: 181 KDIAWSGPLIDDINGKyR^GYKEGLKKNGLNFKEGLVFEANYRYAEGFALAQRVINAG 240 

KDIAFVSGPLIDDINGKVRLAGYKEGL+KN L+FKEGLVFEANY Y +G+ LAQRV+N+G 
Sbjct: 181 KDIAFVSGPLIDDINGKTOIAGYKEGLEKNNLSFKEGLVFFANYNYKDGYE1AQRVMNSG 240 

35 

Query: 241 ATAAYVAEDEIAAGLLNGLFEAGKRVPEDFEIITSNDSPIAQYTRPNLTSISQPVYDLGA 300 

ATAAYVAEDELAAGLLNGLF AGK+VPEDFEI+TSNDSPI YTRPNL+SISQPVYDLGA 
Sbjct: 241 ATAAYVAEDEIAAGLLiNGLFAAGKKVPEDFEILTSlSIDSPITSYTRPNLSSISQPVYDLGA 300 

40 Query: 301 VSMRMLTKIMHKEELEEKEIVLNHGIVKRGTT 332 

VSMRMLTKIM+KEELEEKEI+LNHG+ RGTT 
Sbjct: 301 VSMRMLTKIMNKEELEEKEIILNHGLKLRGTT 332 

A related DNA sequence was identified in S. pyogenes <SEQ ID 1227> which encodes the amino acid 
45 sequence <SEQ ID 1228>. Analysis of this protein sequence reveals the following: 

Possible site: 29 

»> Seems to have no N-terminal signal sequence 

Final Results 

50 bacterial cytoplasm Certainty=0. 2154 (Affirmative) < suco 

bacterial membrane Certainty^O . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

55 Identities = 307/332 (92%) , Positives = 320/332 (95%) 

Query: 1 MNTDDTITIYDVAREAGVSMATVSRvVNGNKNVKENTRKK^^jEVIDRLDYRPNAVARGLA 60 

MNTDD +TIYDVAREAGVSMATVSRVVNGNKNVKENTRKKVLEVIDRLDYRPNAVARGIA 
Sbjct: 1 MOTDDPLTIYDVAREAGVSMATVSRVVNGNKNVKENTRKKVLEVIDRLDYRPNAVARGLA 60 
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4.70 




Query: 


61 


SKKTTTVGWI PNIANSYFS IIiARGIDDIARMYKXNIVLASSDEDDDKEVNVVNTLFAKQ 


120 






SKKTTWGWIPNIANSYFSILA+GIDDIAAMYKOTIVLASSDEDDDKEVNVVNTLFAKQ 




Sbjct: 


61 


SKKTTWGVVIPNIANSYFSIIAKGIDDIAflMYKYNIVLASSDEDDDKEVNVVNTLFAKQ 


120 


Query: 


121 


VDGIIFMGHHLTEKIRAEFSRSRTPIVLAGTVDLEHQLPSVNIDYKAAAVDVIDILAGNH 


180 






VDGIIFMGHHLTEKIRAEFSRSRTP+VLAGTVDL+HQLPSVNIDY+AA +V+DILA NH 




Sb j ct : 


121 


VDGIIFMGHHLTEKIRAEFSRSRTPVVLAGTVDLDHQLPSVNIDYRAAVSNVVDILAENH 


180 


Query: 


181 


KDIAFVSGPLIDDINGKVRLAGYKEGLKKNGLNFKEGLVFEANYRYAEGFALAQRVINAG 


240 






K IAFVSGPLIDDINGKVRLAGYKEGLK N L+FKEGLVFEANY Y EGF LAQRVIN+G 




Sb j ct : 


181 


KCIAWSGPLIDDINGKVRIAGYKEGLKHNKLDFKEGLVFEANYSYKEGFELAQRVINSG 


240 


Query: 


241 


ATAA.YVAEDEIAAGLLNGLFEAGKRVPEDFEIITSNDSPIAQYTRPNLTSISQPVYDLGA 


300 






ATAAYVAEDELAAGLLNGLFEAGKRVPEDFEIITSNDSP+ QYTRPNL+SISQPVYDLGA 




Sb j ct : 


241 


ATAA.YVAEDEIAAGLLNGLFEAGKRVPEDFEIITSNDSPWQYTRPNLSSISQPVYDLGA 


300 


Query: 


301 


VSMRMLTKIMHKEELEEKEIVLNHGIVKRGTT 332 








VSMRMLTK1M+KEELEEKEI+LNHGI KRGTT 




Sbjct: 


301 


VSMRMLTKIMNKEELEEKEILLNHGIKKRGTT 332 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 376 

A DNA sequence (GBSx0407) was identified in S.agalactiae <SEQ ID 1229> which encodes the amino 
acid sequence <SEQ ID 1230>. This protein is predicted to be PepQ (pepQ-2). Analysis of this protein 
sequence reveals the following: 

Possible site: 22 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0 . 1118 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC46293 GB:AF014460 PepQ [Streptococcus mutans] 
Identities = 257/359 (71%) , Positives = 304/359 (84%) 



Query: 


1 


MSKLNRIRHHLHSVQAELAVFSDPVTvNYLTGFFCDPHERQMFLFVYEDRDPILFVPALE 


60 






MSKL +1 L E AV SDPV++NYLTGF+ DPHER MFLF++ D++ +LF+P L+ 




Sb j ct : 


1 


MSKIAQIVQKLKKQGIEAAVLSDPVSINYLTGFYSDPHERLMFLFLFADQETLLFLPELD 


60 


Query: 


61 


VSRRKQSVPFPVFGYIDSENPWQKIASNLPSFSVSKVIAEFDNLNVTKFQGLQTVFDGHF 


120 






RAK + V GY+D ENP +KI + LP + SK+ EFDNLNVTKF+GL+T+F G F 




Sb j ct : 


61 


ALRAKSILDISVTGYLDFENPLEKIKTLLPKTNYSKIALEFDNLNVTKFKGLETIFSGQF 


120 


Query: 


121 


ENLTPYIQNMRLIKSRDEIEKMLVAGEFADKAVQVGFDNISLNNTETDIIAQIEFEMKKQ 


180 






NLTP I MRLIKS DEI+K+L+AGE ADKAVQ+GFD+ISLN TETDI IAQIEFEMKK 




Sbjct: 


121 


TNLTPLINRMRLIKSADEIQKLLIAGELADKAVQIGFDSISLNATETDIIAQIEFEMKKL 


180 


Query: 


181 


GINKMSFDTMVLTGNNAANPHGIPGTNKIFJNNALIjLFDLGVETLGYTSDOTRTVAVGKPD 


240 






G++KMSF+TMVLTG+NAANPHG+P ++KIENN LLLFDLGVE+ GY SDMTRTVAVG+PD 




Sbjct: 


181 


GVDKMSFETMVLTGSNAANPHGLPASHKIENNHLIiFDI^VESTGWSDMTRTVAVGQPD 


240 


Query: 


241 


QFKKDIYHLCLFAHQAAIDFIKPGVIASEVDAAARNVIEKftGYGQYFNHRLGHGLGMDVH 


300 






QFKKDIY++CLEA A+DFIKPGV A++VDAAAR+VIEKAGYG YFNHRLGHG+GM +H 




Sbjct: 


241 


QFKKDIYNICLEAQLTALDFIKPGVSAAQVDAAARSVIEKAGYGDYFNHRLGHGIGMGLH 


300 


Query: 


301 


EFPSIMAGNDMEIQEGMCFSVEPGIYIPDKVGVRIEDCGYVTKTGFEVFTKTPKELLYF 359 



EFPSIMAGNDM ++EGMCFSVEPGIYIP+KVGVRIEDCG+VTK GFEVFT+TPKELLYF 
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Sbjct: 301 EFPSIMAGNDMLLEEGMCFSVEPGIYIPEKVGVRIEDCGHVTKNGFEVFTQTPKELLYF 359 

A related DNA sequence was identified in S.pyogenes <SEQ ID 123 1> which encodes the amino acid 
sequence <SEQ ID 1232>. Analysis of this protein sequence reveals the following: 

Possible site: 58 
>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.90 Transmembrane 42 - 58 ( 42 - 59) 



Final Results 

bacterial membrane Certainty=0 . 1362 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

>GP:AAC46293 GB:AF014460 PepQ [Streptococcus mutans] 
Identities = 264/359 (73%) , Positives = 304/359 (84%) 

Query: 1 MTKLDQIRLYLDQKGAELAIFSDPVTINYLTGFFCDPHERQLFLFVYHDLAPVLFVPALE 60 

M+KL QI L ++G E A+ SDPV+ INYLTGF+ DPHER +FLF++ D +LF+P L+ 
Sbjct: 1 MSKLAQIVQKLKKQGIEAAVLSDPVSINYLTGFYSDPHERLMFLFLFADQETLLFLPELD 60 

Query: 61 VARASQAISFPVFGYVDSENPWEKIKAVLPNTAAKTIYAEFDHLNVNKFHGLQTIFSGQF 120 

RA + V GY+D ENP EKIK +LP T I EFD+LNV KF GL+TIFSGQF 
Sbjct: 61 ALRAKS I LD I SVTGYLDFENPLEKI KTLLPKTNYSKI ALEFDNLNVTKFKGLETI FSGQF 120 

Query: 121 NNLTPYVQGMRLVKSADEINKMMIAGQFADKAVQVGFDNISLDATETDVIAQIEFEMKKQ 180 

NLTP + MRL+KSADEI K++IAG+ ADKAVQ+GFD+ ISL+ATETD+ IAQIEFEMKK 
Sbjct: 121 TNLTPLINRMRLIKSADEIQKLLIAGELADKAVQIGFDSISLNATETDIIAQIEFEMKKL 180 

Query: 181 GIHKMSFDTMVLTGNNAANPHGIPGTNNIENNALLLFDLGVETLGYTSDMTRTVAVGQPD 240 

G+ KMSF+TMVLTG+NAANPHG+P ++ IENN LLLFDLGVE+ GY SDMTRTVAVGQPD 
Sbjct: 181 GVDKMSFETMVLTGSNAANPHGLPASHKIEliNHLLLFDLGVESTGYVSDMTRTVAVGQPD 240 

Query: 241 QFKIDIYNLCLEAQLAAIDFIKPGVTAAQVDAAARQVIEKAGYGEYFNHRLGHGIGMDvH 300 

QFK DIYN+CLEAQL A+DFI KPGV+AAQVDAAAR VIEKAGYG+YFNHRLGHGIGM +H 
Sbjct: 241 QFKKDIYNICLEAQLTALDFIKPGVSAAQVDAAARSVIEKAGYGDYFNHRLGHGIGMGLH 300 

Query: 301 EFPSIMAGNDLVLEEGMCFSVEPGIYIPGKVGVRIEDCGHVTKNGFEVFTHTPKELLYF 359 

EFPSIMAGND++LEEGMCFSVEPGIYIP KVG VRI EDCGHVTKNGFEVFT TPKELLYF 
Sbjct: 301 EFPS IMAGNDMLLEEGMCFS VEPGI YI PEKVG VRIEDCGHVTKNGFEVFTQTPKELLYF 359 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 288/361 (79%) , Positives = 325/361 (89%) 

Query: 1 MSKLNRIRHTILHSVQAEIAVFSDPVTvNYLTGFFCDPHERQMFLFVYEDRDPILFVPALE 60 

M+KL++IR +L AELA+FSDPVT+NYLTGFFCDPHERQ+FLFVY D P+LFVPALE 
Sbjct: 1 MTKLDQIRLYLDQKGAEIiAIFSDPVTINYLTGFFCDPHERQLFLFVYHDLAPVLFVPALE 60 

Query: 61 VSRAKQSVPFPVFGYIDSENPWQKIASNLPSFSVSKVLAEFDNLNVTKFQGLQTVFDGHF 120 

V+RA Q++ FPVFGY+DSENPW+KI + LP+ + + AEFD+LNV KF GLQT+F G F 
Sbjct: 61 VARASQAI SFP VFGYVDSENPWEKI KAVLPNTAAKTIYAEFDHLNVNKFHGLQTI FSGQF 120 

Query: 121 ENLTPYIQN^LIKSFJ3EIEKMLVAGEFADKAVQVGFDNISLNNTETDIIAQIEFEMKKQ 180 

NLTPY+Q MRL+KS DEI KM+ +AG+ FADKAVQVGFDNI SL+ TETD+ IAQIEFEMKKQ 
Sbjct: 121 NNLTPWQG^LVKSADEINKMMIAGQFADKAVQVGFDNISLDATETDVIAQIEFEMKKQ 180 

Query: 181 GINKMSFDTMVLTGNNAANPHGIPGTNKIENNAI^LFDLGVETLGYTSDMTRTVAVGKPD 240 

GI+KMSFDTMVLTGNNAANPHGIPGTN IENNALLLFDLGVETLGYTSDMTRTVAVG+PD 
Sbjct: 181 GIHKMSFDTMVLTGNNAANPHGIPGTNNIENNALLLFDLGVETLGYTSDMTRTVAVGQPD 240 

Query: 241 QFKKDIYHLCLEAHQAAIDFIKPGVLASEVDAAARNVIEKAGYGQYFNHRLGHGLGMDVH 300 

QFK DIY+LCLEA AAIDFIKPGV A++A/DAAAR VTEKAGYG+YFNHRLGHG+GMDVH 
Sbjct: 241 QFKIDIYNLCLEAQLAAIDFIKPGVTAAQVDAAARQVIEKAGYGEYFNHRLGHGIGMDVH 300 
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Query: 301 EFPSIMAGNDMEIQEGMCFSVEPGIYIPDKVGVRIEDCGYVTKTGFEVFTKTPKELLYFEG 361 

EFPSIMAGND+ ++EGMCFSVEPGIYIP KVGVRIEDCG+VTK GFEVFT TPKELLYFEG 
Sbjct: 301 EFPSIMAGNDLVLEEGMCFSVEPGIYIPGKVGVRIEDCGHVTKNGFEVFTHTPKELLYFEG 361 

5 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 377 

A DNA sequence (GBSx0408) was identified in S.agalactiae <SEQ ID 1233> which encodes the amino 
acid sequence <SEQ ID 1234>. Analysis of this protein sequence reveals the following: 

10 Possible site: 14 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3629 (Affirmative) < suco 

15 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

20 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 378 

A DNA sequence (GBSx0409) was identified in S.agalactiae <SEQ ID 1235> which encodes the amino 
acid sequence <SEQ ID 1236>. This protein is predicted to be beta-hexosamidase A precursor. Analysis of 
25 this protein sequence reveals the following: 

Possible site: 47 

■ >>> Seems to have no N-terminal signal sequence 

Final Results 

30 bacterial cytoplasm Certainty=0. 3279 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

35 >GP:CAB11942 GB:Z99104 alternate gene name: yzbA-similar to 

beta-hexosaminidase [Bacillus subtilis] 
Identities = 151/602 (25%) , Positives = 268/602 (44%) , Gaps = 69/602 (11%) 

Query: 26 INEMTLDEKIGQLF FNMGASRSEEYLTDVLDRYHIAAVRYNRGSSSEIYDQNL- 78 

40 +N M+LDEK+GQ+ + S + LT + D +Y G ++ +N+ 

Sbjct: 39 VNRMSLDEKLGQMLMPDFRNWQKEGESSPQALTKMNDEVASLVKKYQFGGI I -LFAENVK 97 

Query: 79 ILQTKSKLPMLIAANTEAGGrX^VTDGTKVGDEIKVAA'TNDPKYAYEMG 127 

+ K+P++++ + E G + +GT + + A AY+ G 

45 Sbjct: 98 TTKQTVQLTDDYQKASPKIPLMLSIDQEGGIVTRLGEGTNFPGNMALGAARSRINAYQTG 157 



50 



Query: 128 RIAGMFASAVGCNASFSPIvDLTRNWRNPIIASRNWGANVDQIISLSKEYMKGIMQYNIV 187 

I G E SA+G N FSP+VD+ N NP+I R++ +N + L MKG+ + +1 
Sbjct: 158 SIIGKELSALGINTDFSPWDINNNPDNPVIGVRSFSSNRELTSRLGLYTMKGLQRQDIA 217 

Query: 188 PFAKHFPGDGIDERDHHLSFASNPMSKEEWMSTFGRIYGELADAGLPGVMAGHIHLPNVE 247 

KHFPG G + D H +E + + DAG VM H+ P + 

Sbjct: 218 SALKHFPGHGDTDVDSHYGLPLVSHGQERLREVELYPFQKAIDAGADMVMTAHVQFPAFD 277 
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Query: 248 KEMHPER--DLDDMLPASI1NKTLLDELLRGELGYNGAIVTDASHMVGMTASMARRDLLPT 305 

+ + D ++PA+L+K ++ LLR E+G+NG IVTDA +M + + + + 

Sbjct: 278 DTTYKSKLDGSDILVPATLSKI07MTGLLRQEMGFNGVIVTDALNMKAIADHFGQEEAVVM 337 

5 Query: 306 AIEAGCDLFLF FNDPDED IQWMKEGYEKGILTEERLHDALRRTLGLKAKLG 356 

A++AG D+ L E+ IQ +KE + G + E+++++++ R + LK K G 

Sbjct: 338 AVKAGVDIALMPASVTSLKEEQKFARVIQALKEAVKNGDIPEQQINNSVERIISLKIKRG 397 

Query: 357 LHNYEGRRQELFMPK-DKAMALIOTLESQKIADEVaDKAVTLVKDKQKDIFPVNPERYRH 415 
10 + Y R + KKA++ + +K ++A+KAVT++K++Q + P P++ 

Sbjct: 398 M- -YPARNSDSTKEKIAKAKKIVGSKQHLKAEKKLAEKAVTVLKNEQHTL- PFKPKKGSR 454 

Query: 416 ILLVNVEGYKGGFGAMIAGNKQRASDYMKE LLEARGHEVTVWESTEERIMKLPQ 469 

IL+V + A +Q D +K L V+++ E+ +K 

15 Sbjct: 455 ILIV APYEEQTASIEQTIHDLIKRKKIKPVSLSKMNFASQVFKTEHEKQVK- - - 505 

Query: 470 EERAAAIANVYAQK-QPIANLTEHYDLIINLVDVNAGGTTQRIIWPAAKGTPDQPFYVHE 528 

E 1 Y K P+ N D +1+ D + + ++P A + H 

Sbjct: 506 -EADYIITGSYWKNDPWN DGVID- -DTISDSSKWATVFPRA VMKAALQHN 554 

20 

Query: 529 I PS I VI SVQHAFALADMPQVGTYINAYD GLPSTI SAWAKLAGESEFTGVSP 580 

P +++S+++ +A++ IY LIAV + G+++ G P 

Sbjct: 555 KPFVLMSLRNPYDAANFEEAKALIAVYGFKGYANGRYLQPNIPAGVMAIFGQAKPKGTLP 614 

25 Query: 581 VD 582 

VD 

Sbjct: 615 VD 616 

No corresponding DNA sequence was identified in S.pyogenes. 

30 A related GBS gene <SEQ ID 8565> and protein <SEQ ID 8566> were also identified. Analysis of this 
protein sequence reveals the following homology to a lipoprotein, with homology with the following 
sequences in the databases: 

29.5/52.3% over 422aa 

Bacillus subtilis 

35 EGAD|20114| hypothetical 70.6 kd protein in feua 5'region precursor Insert characterized 

SP|P40406 I YBBD__BACSU HYPOTHETICAL 70.6 KDA LIPOPROTEIN IN FEUA-SIGW INTERGENIC REGION 
PRECURSOR (ORF1) . Insert 
characterized 

GP| 1944006 |dbj |BAA19499.l| |AB002150 YbbD Insert characterized 
40 GP|438455|gb|AAA64351.l| |L19954 possible N-terminal signal sequence; mature protein may 

be membrane -anchored and start at Cys-17. 17.5% identity 

over 354 -aa overlap with Candida pelliculosa beta-glucosidase. ; putative Insert 
characterized 

GP| 2632433 I errib Insert characterized 

45 

ORF0043K367 - 1557 of 2388) 

EGAD 1 20114 |BS0166 (36 - 458 of 642) hypothetical 70.6 kd protein in feua 5'region precursor 
{Bacillus subtilis} SP| P40406 | YBBD_BACSU HYPOTHETICAL 70.6 KDA LIPOPROTEIN IN FEUA-SIGW 
INTERGENIC REGION PRECURSOR (ORF1) . GP | 1944006 | dbj | BAA19499 . 1 | |AB002150 YbbD {Bacillus 
50 subtilis} GP|438455|gb|AAA64351.l| |L19954 possible N-terminal signal sequence; mature 

protein may be membrane -anchored and start at Cys-17. 17.5% identity over 354-aa overlap 
with Candida pelliculosa beta-glucosidase.; putative {Bacillus subtilis} GP | 2632433 | emb 
%Match = 9.6 

%Identity =29.5 %Similarity =52.2 
55 Matches = 119 Mismatches = 183 Conservative Sub.s = 92 

114 144 174 204 234 264 294 324 

LMVGDSLGDIAAAEQNGIAFYPVLVGKEWSWEILREDIGEAFAKGQFEQQR 

60 MRPVFPLILSAVLFLSCFFGA 

10 20 

354 384 414 426 456 486 528 
KKPFNLNQEAIEWIEKTINEMTLDEKIGQLFF NMGASRSEEYLTDVLDRYHIAAVRYNRGS SSEIYDQ 
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RQTEASASKRAIDANQIVNRMSLDEKIGQMLMPDFRNWQK^ 

40 50 60 70 80 90 100 

543 573 603 633 663 693 723 753 

NLIL QTKBKLPMLIAANTEAGGDGAVTOCT^ 

= I = |:|:::: =11 = =11 = = I 11=1111 11=1 I 111=11= 

TVQLTDDYQKASPKIPLMLSIDQEGGIVTRLGEGTNFPGN^^ 

120 130 140 150 160 170 180 



783 813 843 873 903 933 963 993 

KWRNPIIASRNWGANVDQIISLSKEYMKGIMQYNI^ 

i 11 = 1 i = = =i = i iii= = =i iiiii i=ii= =i = = ii 

NPDNPVIGVRSFSSNRELTSRLGLYTMKGLQRQDIASALKHFPGHGDTDVDSHYGLPLVSHGQERLREVELYPFQKAIDA 
15 200 210 220 230 240 250 260 

1023 1053 1080 1107 1137 1167 1197 1227 

GLPGVMAGHIHLPNVEKEMHPER-DLDDML-PASMKTLLDELL^ 

I II |:::| = = = I 1=1 11=1=1 == III 1=1=11 IIIII =1 = = = = l==l 

20 GADMVMTAHVQFPAFDDTTYKSKLDGSDILVPATLSKKVMTGL^ 

280 290 300 310 320 330 340 



1290 1320 1350 1380 1410 1437 

GCDLFLF FNDPDE DIQWMKEGYEKGILTEERLHDALRRTLGLKAKLGLHNYEGRRQELFMPK-DKAMAIiIN 



GVDIALMPASVTSLKEEQKFARVIQALKEAVKNGDIPEQQINNSVERIISLKIKRGM- -YPARNSDSTKEKIAKAKKIVG 
360 370 380 390 400 410 



1467 1497 1527 1557 1587 1617 1647 1677 

3 0 TLESQKIADEVADKAVTLVKDKQKDI FPWPERYRHILLVNVEGYRGGFGAMIAGNKQRASDYMKELLEARGHEVTVWES 

: : | :,|:||||::|::| :| |:: ||,| : : | |::: 

SKQHLKAEKKLAEKAVTVLKNEQ-HTLPFKPK^ 

430 440 450 460 470 480 490 

SEQ ID 1236 (GBS50) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
35 extract is shown in Figure 1 1 (lane 8; MW 69.2kDa). 

GBS50-His was purified as shown in Figure 192, lane 5. 

The GBS50-His fusion product was purified (Figure 192, lane 5) and used to immunise mice. The resulting 
antiserum was used for FACS (Figure 264), which confirmed that the protein is immunoaccessible on GBS 
bacteria. 

40 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 379 

A DNA sequence (GBSx0410) was identified in S.agalactiae <SEQ ID 1237> which encodes the amino 
acid sequence <SEQ ID 1238>. Analysis of this protein sequence reveals the following: 

45 Possible site: 20 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2266 (Affirmative) < suco 

50 bacterial membrane Certainty=0 . 0000 (Wot Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 380 

A DNA sequence (GBSx0411) was identified in S.agalactiae <SEQ ID 1239> which encodes the amino 
5 acid sequence <SEQ ID 1240>. Analysis of this protein sequence reveals the following: 

Possible site: 49 

>>> Seems to have no N-terminal signal sequence 

Final Results 

10 bacterial cytoplasm Certainty=0. 2279 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9705> which encodes amino acid sequence <SEQ ID 9706> 
15 was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC21726 GB:U32690 oxidoreductase [Haemophilus influenzae Rd] 
Identities = 197/271 (72%) , Positives = 229/271 (83%) 

20 Query: 26 NKWVITGAGGVLCGYMAKEFAKAGAKVALLDLNQEAAQTFADEI VEEGGIAKAYKANVL 85 

NK+++ITGAGGVLC ++AK+ A A +ALLDLN EAA A EI + GG AKAYK NVL 
Sbjct: 15 NKLI I ITGAGGVLCSFLAKQIAYTKANIALLDI^FEAADK^AKEINQSGGKAKAYKTNVL 74 

Query: 86 SKENLEEVHQAVLEDLGPTDILVNGAGGNNPKATTDNEFHELDLPSETKTFFELDEAGIS 145 
25 EN++EV + D G DIL+NGAGGNNPKATTnNEFH+ DL T+TFF+LD++GI 

Sbjct: 75 ELENIKEVraQIETDFGTCDILINGAGGNNPKATTntffiFHQFDIiNETTRTFFDLDKSGIE 134 

Query: 146 FVFNIJSreLGTLLPTQVFAQDIWGREGANIINISSMNAFTPLTKIPAYSGAKAAISNFTQW 205 
FVFNLNYLG+LLPTQVFA+DM+G+ +GANI INISSMNAFTPLTKI PAYSGAKAAI SNFTQW 
30 Sbjct: 135 FVFNLNYLGSLLPTQVFAKDMLGKQGANIINISSMNAFTPLTKIPAYSGAKAAISNFTQW 194 

Query: 206 IAVHFSKVGIRCNAIAPGFLVTNQNRSLLFTEDGQPTARAEKILNNTPMGRFGEASELIG 265 

IAV+FSKVGIRCNAIAPGFLV+NQN +LLF +G+PT RA KIL NTPMGRFGE+ EL+G 
Sbjct: 195 LAWFSKVGIRCWAIAPGFLVSNQNLALLFDTEGKPTDRANKILTNTPMGRFGESEELLG 254 



35 



Query: 266 GLFFLADEKSSSFVNGWLPIDGGFAAYSGV 296 

L FL DE S+FVNGWLP+DGGF+AYSGV 
Sbjct: 255 ALLFLIDENYSAFVNGWLPVDGGFSAYSGV 285 

40 A related DNA sequence was identified in S.pyogenes <SEQ ID 1241> which encodes the amino acid 
sequence <SEQ ID 1242>. Analysis of this protein sequence reveals the following: 

Possible site: 32 

>>> Seems to have no N-terminal signal sequence 

45 Final Results 

bacterial cytoplasm Certainty=0. 0358 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

50 An alignment of the GAS and GBS proteins is shown below: 

Identities = 77/279 (27%) , Positives = 125/279 (44%) , Gaps = 19/279 (6%) 

Query: 18 MSKTITFTNKVWITGAGGVLCGYMAKEFAKAGAKVALLDLNQEAAQTFADEIVEEGGIA 77 
M + K+ +ITGA + +AK +A+AGA + D+ QE EGA 
55 Sbjct: 1 MENMFSLQGKIALITGASYGIGFEIAKAYAQAGaTIVENDIKQELVDKGLaAYRELGIEA 60 



Query: 78 KAYKANVLSKENLEETOQAVLEDLGPTDILVNGAGGfflSPKATTDNEFHELDLPSETKTFF 137 
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Y +V + ++++ + +++G DILVN AG 
Sbjct: 61 HGYVCDVTDEAGIQQMVSQIEDEVGAIDILVMNaG IIRRTPML 103 

Query: 138 ELDEAGISEVraiOTLGTLLPTQVFAQDMVGREGANIINISSMNAFTPLTKIPAYSGAKA 197 
5 E+ V +++ + ++ M+ + IINI SM + + AY+ AK 

Sbjct: 104 EIViaAEDFRQVIDIDLNAPFIVSKAVLPSMIAKGHGKIINICSMMSELGRETVSAYAAAKG 163 

Query: 198 AISNFTQWI^VHFSKVGIRCNAIAPGFLVTNQmSLLFTE-DGQPTARAEKIl^lNTPMGR 256 
+ T+ +A F + I+CN I PG++ T Q L + DG + 1+ TP R 

10 Sbjct: 164 GLKMLTKNIASEFGEANIQCNGIGPGYIATPQTAPLRERQADGSRHPFDQFIIAKTPAAR 223 

Query: 257 FGEASELIGGLFFIADEKSSSFVNGVVLPIDGGFAAYSG 295 

+G +L G FLA + +S+FVNG +L +DGG AY G 
Sbjct: 224 WGTTEDIAGPAVFIASD-ASNFvNGHILYVDGGILAYIG 261 

15 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 381 

A DNA sequence (GBSx0412) was identified in S.agalactiae <SEQ ID 1243> which encodes the amino 
20 acid sequence <SEQ ID 1244>. This protein is predicted to be D-mannonate dehydrolase (uxuA). Analysis 
of this protein sequence reveals the following: 

Possible site: 42 

>» Seems to have no N-terminal signal sequence 

25 Final Results 

bacterial cytoplasm Certainty=0. 3188 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 
bacterial outside Certainty=0. 0000 (Not Clear) < suco 

30 The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB04425 GB:AP001509 D-mannonate dehydrolase [Bacillus halodurans] 
Identities = 202/343 (58%) , Positives = 261/343 (75%) 

Query: 1 MEMSFRWYGEDDPVTLENI GQI PTMKGI VTAI YDVPVGEVWSRERI QQLKEKVEAAGLKI 60 
35 M ++ RW+G D V LE I QIP MKGIV+AIYDV VG VW +E+I LK +E GL + 

Sbjct: 1 MRLTMRWFGPSDKVKLEYI KQI PGMKGIVSAI YDVAVGGVWPKEKILALKNNIERHGLTL 60 

Query: 61 SVIESVPWEDIKLGRPTRDLLIDNYIQTVKNLAAEGIDTICYNFMPVFDWTRTDLAYQY 120 
VIESVPVHEDIKLG+PTRD I+NY QT+++LA GIDT+CYNFMPVFDWTR+ L ++ 
40 Sbjct: 61 DVIESVPVHEDIKLGKPTRDRYIENYKQTLRHLAECGIDTVCYNFMPVFDWTRSQLDFKL 120 

Query: 121 PDGSTALIFDETVSKKMDPVNGELSLPGWDASYSKEEMKAIMDAYAEIDEEKLWENLTYF 180 

DGS ALI++E V + +P++GEL LPGWD SY E +K ++ AY +1 EE LW++LTYF 
Sbjct: 121 EDGSEALIYEEDVISRTNPLSGELELPGWDTSYENESLKGVLQAYKKISEEDLWDHLTYF 180 

45 

Query: 181 IKRI IPEAEAVG VKMAIHPDDPPYS I FGLPRI ITGLEAIERFVKLYDSKSNGITLCVGSY 240 

++ I+P A+ VG+KMAIHPDDPP+SIFGLPRI+T +ER + LYDS ++GIT+C GS 
Sbjct: 181 VQAIMPVADEVGIKMAIHPDDPPWSIFGLPRIVTNKANLERLLSLYDSPNHGITMCSGSL 240 

50 Query: 241 ASDPQNDVLEISRRAFELDRVNFVHARNIKLGDGKSFKESAHPSEYGSIDMYEVIKLCHE 300 

++ ND+ E+ R R++F HARNIK +SF+ESAH SE GS++M ++K H+ 

Sbjct: 241 GANFANDLPEMIRHFGGQGRIHFAHARNIKRTGPRSFQESAHLSEAGSVNMVAMLKAYHD 300 

Query: 301 FGFEGAIRPDHGRMIWGETGRPGYGLYDRALGATYVSGLYEAV 343 
55 GF G +RPDHGRMIWGE GRPGYGLYDRALGATY++G++EAV 

Sbjct: 301 IGFTGPLRPDHGRMIWGEKGRPGYGLYDRALGATYLNGIWEAV 343 



No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 382 

A DNA sequence (GBSx0413) was identified in S.agalactiae <SEQ ID 1245> which encodes the amino 
5 acid sequence <SEQ ID 1246>. Analysis of this protein sequence reveals the following: 

Possible site: 14 

>» Seems to have no N-terminal signal sequence 

Final Results 

10 bacterial cytoplasm Certainty=0 . 2447 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

15 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 383 

A DNA sequence (GBSx0414) was identified in S.agalactiae <SEQ ID 1247> which encodes the amino 
20 acid sequence <SEQ ID 1248>. This protein is predicted to be uronate isomerase. Analysis of this protein 
sequence reveals the following: 

Possible site: 14 

»> Seems to have no N-terminal signal sequence 

25 Final Results 

bacterial cytoplasm Certainty=0 .3066 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

30 The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB04424 GB:AP001509 uronate isomerase [Bacillus halodurans] 
Identities = 215/465 (46%) , Positives = 294/465 (62%) , Gaps = 7/465 (1%) 

Query: 3 FNTETFMLKNQAAIQLYEE-VKRQPIFDYHCHLDPKDIFEDHIFDNIVDLWLGGDHYKWR 61 
35 F +E F+L N+ +LY K PI DYHCHL P++I+E+ F+N+ WLGGDHYKWR 

Sbjct: 4 FLSEDFLLMNEYDRELYYTFAKNMPICDYHCHLSPQEIWENKPFENMTKAWLGGDHYKWR 63 

Query: 62 LMRANGISFAEITGPASNLEKFKAFARTLERAYGNPVYHWSAMELKNVFGVNEILTESNA 121 
MR NG+ E ITG A + EKF A+A+T+ + GNP+YHW+ MELK F ++ L E+N 
40 Sbjct: 64 AMRmGTOEEFITGGAPDKEKFIAWAKWPKTIGNPLYHWTHMELKTYFHFHQPLDETNG 123 

Query: 122 EEIYHRI^FLKEHKISPRRLIADSKVMFIGTTDHPLDTLEWHKKLAADESFKTWAPTF 181 

E ++ N L++ +PR LI S V IGTTD P D+L +H+KL AD++F V PTF 
Sbjct: 124 ENVWDAOTOLLQQFAFTPRALIERSNVRAIGTTDDPTDSLLYHQKLQADDTFHVKVIPTF 183 

45 

Query: 182 RPDEAF-IEHRHFVDFITKLGDITQKEITDFSTFIAAMEERIAYFAQNGCRASDISFTEI 240 

RPD A IE F D++ KL D+T + + F+ A++ER+ +F ++GCR+SD TE+ 

Sbjct: 184 RPDGALKIEQDSFADWVAKLSDWGESLDTLDAFLHALKERLTFFDEHGCRSSDHDMTEV 243 

50 Query: 241 VFEQTDELELNDLFNKVCEGYIPNQSEISKWQTAvPMELCRLYKKYGFOTQVHFGADRNN 300 

F + +E E +F K + E K++T + L + Y G+V Q H G +RNN 

Sbjct: 244 PFVEVNEQEAQHIFRKRIANEGLTKVENEKYKTFLMTWI^ 303 



Query: 301 HSTIFEKLGADVGVDSLGD-QVALTVNMNRLLDSLVKKD^ 359 
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+S + KLG D 6 DS+GD Q+A +LLD L K+ +LPK I Y +NP N +A+ 

Sbjct: 304 NSRMLHKLGPDTGFDSIGDGQIAHAT--AKLLDLLDKQGALPKTILYCTOPNANYILASM 361 

Query: 360 Li^FQANELGVRSYLQFGAGWWFADTKLGMISQMNALAEQGMLANFIGMLTDSRSFLSYQ 419 
5 + NF E GVR +QFG+ WWF D GM Q+ LA G+L+NFIGMLTDSRSFLSY 

Sbjct: 362 IGNF - - TESGVRGKVQFGSAWWFNDHI DGMRRQLTDLASVGLLSNF IGMLTDSRS FLS YP 419 

Query: 420 RHDYFRRILCTYLGEWIEEGEVPEDYQALGSMAKDIAYQNAVMYF 464 
RHDYFRRILC +G WI+EG++P D + G + +DI Y N V+YF 
10 Sbjct: 420 RHDYFRRILCQLIGSWIKEGQLPPDMERWGQIVQDICYNNWDYF 464 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

15 Example 384 

A DNA sequence (GBSx0415) was identified in S.agalactiae <SEQ ID 1249> which encodes the amino 

acid sequence <SEQ ID 1250>. This protein is predicted to be 2-dehydro-3-deoxyphosphogluconate 

aldolase/4-hydroxy-2-oxoglutarate al. Analysis of this protein sequence reveals the following: 

Possible site: 43 
20 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3883 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

25 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9703> which encodes amino acid sequence <SEQ ID 9704> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

30 >GP:AAD35160 GB:AE001693 2 -dehydro-3 -deoxyphosphogluconate 

aldolase/4-hydroxy-2-oxoglutarate aldolase [Thermotoga maritima] 
Identities = 93/199 (46%) , Positives = 125/199 (62%) , Gaps = 6/199 (3%) 

Query: 37 KNNYFFAVIRGKSSEDALEIAKHAILGGIRNIEVTFSTPEASKVIKQLSDDFKNNKEIIV 96 
35 K + AV+R S E+A E A GG+ IE+TF+ P+A VIK+LS F K 1+ 

Sbjct: 8 KKHKIVAVTiRANSvEFAKEKAIAVFEGGVBLIEITFTVPDADTVIKEI)S--FLKEKGAII 65 

Query: 97 GAGTVMTTELAKEA1DAGAKFLVSPHFDSDIANIANFJSIKVYYFPGCATATEIWARKYKC 156 
GAGTV + E ++A+++GA+F+VSPH D +1+ E V+Y PG T TE+V A K 
40 Sbjct: 66 GAGTVTSVEQCRKAVESGAEFIVSPHLDEEISQFCKEKGVFYMPGVMTPTELVKAMKLGH 125 

Query: 157 QIIKLFPGGWGPGFIKDIHGP1PDVDLMPSGGVSVSNWEWRKAGAVAVGVGSALSSKV 216 

I+KLFPG WGP F+K + GP P+V +P+GGV++ NV EW KAG +AVGVGSAL 
Sbjct: 126 TILKLFPGEWGPQWKAMKGPFPNVKFVRTGGVNLDNVCEWFKAGVLAVGVGSALVKGT 185 



45 



Query: 217 ATEGYDSVTKIAKQFVSAL 235 

D V + AK FV + 
Sbjct: 186 P DEVREKAKAFVEKI 200 



50 A related DNA sequence was identified in S.pyogenes <SEQ ID 1251> which encodes the amino acid 
sequence <SEQ ID 1252>. Analysis of this protein sequence reveals the following: 

Possible site: 29 

>>> Seems to have no N-terminal signal sequence 

55 Final Results 

bacterial cytoplasm Certainty^O. 1039 (Affirmative) < suco 



WO 02/34771 



PCT/GB01/04789 



-488- 



bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 82/204 (40%) , Positives = 132/204 (64%) 



Query: 




MLNQLKNNYFFAVIRGKSSEDALEIAKHAILGGIKNIEVTFSTPEASKVIKQLSDDFKNN 


91 






+L +LK N V+RG+SSE+AL + +1 GGI+ IEVT++ P AS+VI QL++ FK + 




Sb j ct : 


6 


ILTIO^KANRLVLvWGESSEEALACSIASIEGGIKTlEvTyTNPFASEVIGQLAERFKED 


65 


Query: 


92 


KEIIVGAGTVMTTELAKEAIDAGAKFLVSPHFDSDIANLANENKVYYFPGCATATEIWA 


151 






E+++GAGTV+ A++AI AGA+F+V P+F+ +A + + + Y PGC T E+V A 




Sbj ct: 


66 


PEVLIGAGTVLDDVTARQAIIAGAQFIVGPNFKTRAVALICHRYSIPYLPGCMTVNEVVTA 125 


Query: 


152 


RKYKCQIIKLFPGGWGPGFIKDIHGPIPDVDLMPSGGVSVSNWEWRKAGAVAVGVGSA 211 






+ ++K+FPG VG FI+ I P+P V++M +GGVS N+ +W AG +G+G 




Sbj ct : 


126 


LESGVDMVKIFPGSTVGISFIRAIKSPLPQWvMvTGGVSSDNLKDWLAAGVDVLGIGGE 


185 


Query: 


212 


LSSKVATEGYDSVTKIAKQFVSAL 235 








+ + + Y+ +TK A ++ +L 




Sbjct: 


186 


FNQIASQKQYNLITKKAAHYIKSL 209 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 385 

A DNA sequence (GBSx0416) was identified in S.agalactiae <SEQ ID 1253> which encodes the amino 
acid sequence <SEQ ID 1254>. This protein is predicted to be pyruvate dehydrogenase complex repressor. 
Analysis of this protein sequence reveals the following: 

Possible site: 26 

>>> Seems to have no N-terminal signal sequence 



The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB12044 GB:Z99105 similar to transcriptional regulator (GntR 
family) [Bacillus subtilis] 
Identities = 67/225 (29%), Positives = 119/225 (52%), Gaps = 17/225 (7%) 



Query: 


3 


RPLVEQTADRLLHLILEREYPVGAKLPNEYELAEDLDVGRSTIREAVRSLATRNILEVRQ 


62 






+ L +Q +R++HL+ + G KLP E EL + L V R +REA+ SL T ++ + 




Sbj ct : 


16 


KTLAKQVIERIVHLLSSGQLRAGDKDPTEMELMDILHVSRPVLREALSSLETLGVITRKT 


75 


Query: 


63 


GSGTYISSKKGVSEDPLGFSLIKDTDRLTSDLFELRLLLEPRIAELVAYRITDDQLQI.IjE 


122 






GTY + KG+ P L TDL + + ER+LE + +A+I +++LQ L+ 




Sbj ct : 


76 


RGGTYFNDKIGM--QPFSvMIALATDNLPA-IIEARMALELGLVTIAAEKINEEELQRLQ 


132 


Query: 


123 


KLVGDIEDAV--HAGDPKHLLLDWFHSMIAKYSGNIAMDSLLPVINQSIHLINANYTNR 


180 



Final Results 



bacterial cytoplasm Certainty=0. 2827 (Affirmative) < succ> 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



K + DI ++ H G+ 
Sbjct: 133 KTIDDIANSTDNHYGE 



D EFH ++A + N ++ ++ QS+ + +A ++ 
-ADKEFHRI IALSANNPWEGMI QSLLITHAKIDSQ 183 



Query: 181 QMKSDSLEAHREIIKAIREKNPVAAHDAMLMHIMSVRRSALK 222 

+ + ++E H++I A+ +++P AH M H+ VR LK 
Sbjct: 184 IPYREPJ3VTVEYHKKIYDALAQRDPYKAHYHMYEHLKFVRDKILK 228 



A related DNA sequence was identified in S.pyogenes <SEQ ID 1255> which encodes the amino acid 
sequence <SEQ ID 1256>. Analysis of this protein sequence reveals the following: 
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Possible site: 54 

>» Seems to have no N- terminal signal sequence 



Final Results 

5 bacterial cytoplasm Certainty=0. 2161 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

10 Identities = 24/51 (47%) , Positives = 35/51 (68%) 

Query: 22 YPVGAKLPNEYELAEDLDVGRSTIREAVRSLaTRNILEVRQGSGTYISSKK 72 

+P+G++LP+E LAE V R T+R+A+ h ILE R GSGTY++S + 
Sbjct: 30 WPIGSRLPSERHLAEHFTVSRMTLRQAITLLVEEGILERRIGSGTYVASHR 80 

15 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 386 

A DNA sequence (GBSx0417) was identified in S.agalactiae <SEQ ID 1257> which encodes the amino 
20 acid sequence <SEQ ID 1258>. Analysis of this protein sequence reveals the following: 

Possible site: 31 

>» Seems to have no N-terminal signal sequence 

Final Results 

25 bacterial cytoplasm — Certainty=0. 2178 (Affirmative) < suco 

bacterial membrane — Certainty=0.0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9701> which encodes amino acid sequence <SEQ ID 9702> 
30 was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAA58911 GB:X84105 gluceronidase [synthetic construct] 
Identities = 258/602 (42%) , Positives = 357/602 (58%) , Gaps = 31/602 (5%) 

35 Query: 23 MLYPLLTKTRNTYDLGGIWNFKLGEHNPN ELLPSDEVMVIPTSFNDLMVSKEK 75 

ML P+ T TR L G+W F L N L + +P SFND + 

Sbjct: 1 MLRPVETPTREIKKLDGLWAFSLDRENCGIDQRWWESALQESRAIAVPGSFNDQFADADI 60 

Query: 76 RDYIGDFWYEK^IEVPKVSEDEEMvLRFGSVTHQAKIYVDGVLVGEHKGGFTPFEVLVPE 135 
40 R+Y G+ WY++ + +PK + +VLRF +VTH K++V+ V EH+GG+TPFE V 

Sbjct: 61 RNYAGNVWQREVFIPKGWAGQRIVLRFDAOTHYGKVWVNNQEvMEHQGGYTPFEADVTP 120 

Query: 136 CKYNNEKI KVS I CANNVLDYTTLPVGNYSE I I QEDGS I KKKVRENFDFFNYAGVHRPLKL 195 
+ +++++C NN L++ T+P G I E+G KKK DFFNYAG+HR + L 

45 Sbjct: 121 YVIAGKSVRITVCVNNELNWQTI PPGIW- - ITDENG- - KKKQSYFHDFFNYAGIHRS VML 176 

Query: 196 MIRPKNHIFDITITSRLSDDLQSADLHFLVETNQKVDEVRISVFDEDNKLV- -GETKDSR 253 

P + DIT+ + ++D A+ + VN +V++DD ++V G+ 
Sbjct: 177 YTTPNTWVDDITVVTHVAQDCNHASVDWQVVAN GDVSVELRDADQQWATGQGTSGT 233 

50 

Query: 254 LFLSDVHLWEVLNAYLYTARVEIFVDNQLQDVYEEMFGLREIEVTNGQFLLNRKPIYFKG 313 

L + + HLW+ YLY V + D+Y G+R + V QFL+N KP YF G 

Sbjct: 234 LQWNPHLWQPGEGYLYELCVTAKSQTEC-DIYPLRVGIRSVAVKGEQFLINHKPFYFTG 292 

55 Query: 314 FGKHEDTFINGRGIjNFAAtttMDIjMjLKDMGANSFRT 373 

FG+HED + G+G + + D L+ +GANS+RTSHYPY+EEM+ AD G++VIDE 
Sbjct: 293 FGRHEDADLRGKGFDNV1MVHDHALMDWIGANSYRTSHYPYAEEMLDWADEHGIVVIDET 352 



Query: 374 PAVGLFQNFNASLDLS PKDNGTWNLM- -QTKAAHEQAIQELVKRDKNHPSWMW 425 
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AVG EN SI) + PK+ + + +T+ AH QAI+EL+ RDKNHPSWMW 

Sbjct: 353 AAVG FNLSLGIGFEAGNKPKELYSEEAWGETQQAHLQAIKELIARDKNHPSVVMW 408 

Query: 426 WANEPASHEAGAHDYFEPLVKLYKDLDPQIOIPVTLTOIIjMATPDRDQWDLVDWCENR 485 
5 +ANEP + GA +YF PL + + LDP RP+T VN++ D + DL DV+CLNR 

Sbjct: 409 S IAISffiPDTRPQGAREYFAPLAEATRKLDPT-RPITCVNVMFCDAHTDTI SDLFDVLCLNR 467 

Query: 486 YYGWYVDHGDLTNAEVGIRKELLEWQDKFPDKPIIITEYGADTLPGLHSTWNIPYTEEFQ 545 
YYGWYV GDL AE + KELL WQ+K +PIIITEYG DTL GLHS + ++EE+Q 
10 Sbjct: 468 YYGWWQSGDLETAEKVLEKELLAWQEKL-HQPIIITEYGVDTIAGLHSMYTDMWSEEYQ 526 

Query: 546 CDFYEMSHRVFDGIPNLVGEQVWNFADFETt^ILRVQGNHKGLFSRNRQPKQVVKEFKK 605 

C + +M HRVFD + +VGEQVWNFADF T+ ILRV GN KG+F+R+R+PK +K 
Sbjct: 527 CAWLDMYHRVFDRVSAWGEQVWNFADFATSQGILRVGGNKKGIFTRDRKPKSAAFLLQK 586 



15 



35 



Query: 606 RW 607 
RW 

Sbjct: 587 RW 588 



20 A related DNA sequence was identified in S.pyogenes <SEQ ID 1259> which encodes the amino acid 
sequence <SEQ ID 1260>. Analysis of this protein sequence reveals the following: 

Possible site: 23 

>» Seems to have no N-terminal signal sequence 
25 INTEGRAL Likelihood = -4.04 Transmembrane 1131 -1147 (1130 -1147) 

Final Results 

bacterial membrane Certainty=0 . 2614 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

30 bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAF97242 GB:AF282987 beta-galactosidase precursor [Streptococcus pneumoniae] 
Identities = 303/921 (32%), Positives = 463/921 (49%), Gaps = 86/921 (9%) 

Query: 5 QKSSEIVT RTITKPSRATSNVKQEIDMTPDSKEQTVTGYQYHYIDQ--EGRKQPFN 58 

+K E VT + KP ++ + ++ ++Q E RK FN 

Sbjct: 96 KKEDEAOTPKEEKVSAKPEEKAPRIESQASNQEKPLKEDAKAVTNEEVNQMIEDRKVDFN 155 



40 Query: 59 QGWRF-LMflDVACAQDPSFDDSNWQVIHLPHDFSLTQPYTRNGEA--ESAYKLGGVGWYR 115 

QWFLA+ APDSW+ + LP+D+S+ + A E GG WYR 

Sbjct: 156 QNWYFKliNANSKFA.IKPDADVSTWKiajDLPYDWSIFNDFDHESPAQNEGGQIiNGGFA.WYR 215 

Query: 116 HYLVLDEVLAGCHVAITFEGSYMETEIYWGQFIGKHIiNGYQEFTYDISDVVTF-GAENL 174 
45 LDE +V +TF+G YM++++YVNGQ +G + NGY +F+YDI+ + G EN+ 

Sbjct: 216 KTFKlDEKDLKKNTOLTFDGVYMDSQVYVNGQLVGHYPNGyNQFSYDITKYLQKDGRENV 275 

Query: 175 IATOVENKVPSSRWYSGSGLYREVSLSVLPQLHFVADQVAMTLADTAVQEKGQQKVDLRF 234 
+AV NK PSSRWYSGSG+YR+V+L V ++H + + Q+ G+ + + 

50 Sbjct: 276 IAVHAVNKQPSSRWYSGSGIYRDWLQVTDKVHVEKNGTTILTPKLEEQQHGKVETHVTS 335 

Query: 235 ALNQSIQTCHYQLSLCLWEQSHCSKDKKLLYQETEVPLADLAFQRQYGLT--LSLEELQL 292 

+ +H++E + + L LL+E+L 

Sbjct: 336 KIVNTDDKDHELVA EYQIVERGGHAVTGLVRTASRTLKAHESTSLDAILEVERPKL 391 

55 

Query: 293 WSP--DNPHLYDLELTLYYQGQVIDCFOJETGFRQLTFMANQGLFVNGRAVKLKGVCLHH 350 

W+ DP LY+L +Y GQ++D G+R + N+G +NG +K GV LHH 

Sbjct: 392 WTVLNDKPALYELITRVYRDGQLVDAK1CDLFGYRYYHWTPNEGFSLNGERIKFHGVSLHH 451 

60 Query: 351 DC^GLGACAYEDALARQLVLLKDMGANTIRSTHNPSSPKLRQLANRLGFFVIEFA.FDTWT 410 

D G LGA A R+L +K+MG N+IR+THNP+S + Q+A LG V EEAFDTW 

Sbjct: 452 DHGALGAEENYKAEYRRLKQMKEMGVNSIRTTHNPASEQTLQIAAELGLLVQEEAFDTWY 511 



65 



Query: 



411 YAKNGNVNDFSNYFHQTIGTENANYLQRTOSPETSWAQYSIFAMVWSAKNDPSVLMWSIG 470 
K D+ +F + A ++ W+ + + MV KN+P++ MWSIG 
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Sbjct: 512 GGK--KPYDYGRFFEKDATHPEARKGEK WSDFDLRTMVERGKNNPAIFMWSIG 562 

Query: 471 NELMEGFSADVSHYPELTRQMCQWITAIDTSRPITFGDNKI.KEADFC-WHEEVSQMATLL 529 
NE+ G + +H +++ +1 +D +R +T G +K + + HE+++ 
5 Sbjct: 563 NEI--GFJ«IGDAHSLATVKRLVKVIKDVDKrRYVTMGADKFRFGNGSGGHEKIA 614 

Query: 530 SQLDHPQGLIGL1TYADGKDYDRLHEEHSDWLLYGSETVSAITSR-AYYKETKKVLDS--- 585 
+LD +G NY++ +Y L +H WL+YGSET SA +R +YY+ +++ S 

10 Sbjct: 615 DELD AVGFNYSE - DNYKALRAKHPKWLIYGSETSSATRTRGS YYRPERELKHSNGP 669 

Query: 586 - -GYHLTSYDHAKVDWGAFASQAWYDTITRDFV- -AGECVWTGFDYLGEPTPWNKIDSGV 641 

Y + Y + +V WG A+ +W T RD AG+ +WTG DY+GEPTPW+ + 
Sbjct: 670 ERNYEQSDYGNDRVGWGKTATASW--TFDRDNAGYAGQFIWTGTDYIGEPTPWHNQNQTP 727 

15 

Query: 642 VGLWPSPKNAYFGILDTAGFPKDSYYFYQSQW--AQGQTTLHLLPVWQKD QLCFD 694 

V K++YFGI+DTAG PK +Y YQSQW + + +HLLP W + D 

Sbjct: 728 V KSSYFGIVDTAGIPKHDFYLYQSQWVSVKKKP^1VHLLPHWNWENKEljASKVAD 781 

20 Query: 695 EQGLVEVWYSNAASVQLMFEDEQGNLTDYGRKAFHTYSTPTGHTYQLYQGADAAKMPHE 754 
+G + V YSNA+SV+L N G K F+ T G TYQ +GA+A 
Sbjct: 782 SEGKIPVRAYSNASSVELFL NGKSLGLKTFNKKQTSDGRTYQ- -EGANA N 829 

Query: 755 NLYLTWRVPYQKGLLRAVAYDI SGKS I PKTSGRSQVRTYGSVAKLSWKAFEAPIDAPW-E 813 
25 LYL W+V YQ G L A+A D SGK I R++TGA+ +IA + 

Sbjct: 830 ELYLEWKVAYQPGTLEAIARDESGKEI ARDKITTAGKPAAVRLIKEDHAIAADGKD 885 

Query: 814 LLYLDLSLLDSRGELVSHAQDWLQVQVEGPARBLALDNGNPTDHTPYQEP LRQAY 868 

L Y+ ++DS+G +V A + ++ Q+ G +L+ +DNG Y+ . +R+A+ 

30 Sbjct: 886 LTYIYYEITOSQC3NWPTANNLVRFQLHGQGQLVGVDNGEQASRERYKAQADGSWIRKAF 945 

Query: 869 GGKLLAILALTGEAGHIKVTA 889 

GK +AI+ T +AG +TA 
Sbjct: 946 NGKGVAIVKSTEQAGKETLTA 966 

35 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 98/414 (23%) , Positives = 175/414 (41%) , Gaps = 64/414 (15%) 

, Query: 54 LPSDEVMVIPTSFM3LMVSKEKRDYIGDFWYEKVIEVPKVSEDEEMVLRFGSVTHQAKIY 113 
40 LPD + P + N SK+GWY+ + +V + + F + +IY 

Sbjct: 86 LPHDFSLTQPYTRNGEAESAYKLGGVG- -WYRHYLVLDEVLAGCHVAITFEGSYMETEIY 143 

Query: 114 VBGVIjVGEHKGGFTPFEVIjVPECKYNNEKIKVSICAHNVLDYTTLPVGNYSEIIQEDGSI 173 
V+G +G+H G+ F + + V+ A N+L + 
45 Sbjct: 144 VNGQFIGKHIM3YQEFTYDISDV VTFGAENLLAVR V 179 

Query: 174 KKKVRENFDFFNYAGVHRPLKLMIRPKNHIFDITITSRLSDDL QSADLHFLVET 227 

+ KV + +++ +G++R + L + P+ H + L+D Q DL F + 

Sbjct: 180 ENKVPSS-RWYSGSGLYREVSLSVLPQLHFVADQVAMTLADTAVQEKGQQKVDLRFALNQ 238 



50 

Query: 228 NQKVDEVRISVF DEDNKLVGETKDS RLFLSDVHLWEVLNA 267 

+ + ++S+ +D KL+ + + L L ++ LW N 

Sbjct: 239 SIQTCHYQLSLCLWEQSHCSKDKKLLYQETEVPLADLAFQRQYGLTLSLEELQLWSPDNP 298 

55 Query: 268 YLYTARVEIFVDNQLQDVYEENFGLREIE-VTNGQFLLNRKPIYFKGFGKHEDTFINGRG 326 

+LY + ++ Q+ D + G R++ +N +N + +KG HD G 
Sbjct: 299 HLYDLELTLYYQGQVIDCFCLETGFRQLTFMANQGLFVNGRAVKLKGVCLHHDQGGLGAC 358 

Query: 327 IJNEAANLMDL^^JI^KDMGANSFRTSHYPYSEE^MRIADRMGVI/VIDEVPAVGLFQ NFN 383 

60 E A L LLKDMGAN+ R++H P S ++ +LA+R+G VI +E + N N 

Sbjct: 359 AYEDALARQLVLLKDMGANTIRSTHNPSSPKLRQLANRLGFFVIEEAFDTWTYAKNG^ 418 

Query: 384 ASLDLSPKDNGTWN LMQTKAAH EQAIQELVKRDKNHPSWMWWANE 430 

+ + GT N L + ++ + +1+ +V KN PSV+MW + NE 

65 Sbjct: 419 DFSNYFHQTIGTENANYLQRTOSPETSWAQYSIEAMVWSAKNDPSVLMWSIGNE 472 
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Based on this analysis, it was predicted that these proteins and their epitopes could he useful antigens for 
vaccines or diagnostics. 

Example 387 

A DNA sequence (GBSx0418) was identified in S.agalactiae <SEQ ID 1261> which encodes the amino 
acid sequence <SEQ ID 1262>. This protein is predicted to be 2-keto-3-deoxygluconate kinase. Analysis of 
this protein sequence reveals the following: 

Possible site: 13 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.53 Transmembrane 197 - 213 ( 197 - 213) 

Final Results 

bacterial membrane Certainty=0 . 1213 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9699> which encodes amino acid sequence <SEQ ID 9700> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAD35161 GB:AE001693 2-keto-3-deoxygluconate kinase [Thermotoga maritima] 
Identities = 115/342 (33%) , Positives = 180/342 (52%) , Gaps = 16/342 (4%) 



Query: 


14 


KIISLGEVLLRLSPPQYHTLMQANHLKCQFGGSELNVLASLAQLGYHVGLVSALPDNDLG 


73 






K+++ GE++LRLSPP + + Q + +GG+E NV A LAQ+G V+ LP+N LG 




Sbjct: 


2 


KVVTFGEIMLRLSPPDHKRIFQTDSFDVTYGGAEANVAAFLAQMGLDAYFVTKLPNNPLG 


61 


Query: 


74 


KMASQFILSQQI S PAAI I KKEGRLGI YYYECX3PS VRTNKVIYDRNYSSFWESTIiSDYDFT 


133 






A+ + + I + R+GIY+ E G S R +KV+YDR +S+ E+ D+D+ 




Sbjct: 


62 


DAAAGHLRKFGVKTDYIARGGNRIGIYFLEIGASQRPSKWYDRAHSAISEAKREDFDWE 


121 


Query: 


134 


SIFKGVDWFHVSGITPALTKDLYEVTRFLMTKAKEGGVKVSIDLNFRESLWSSFQEAREQ 


193 






I G WFH SGITP L K+L + + A E GV VS DLN+R LW+ +EA++ 




Sbjct: 


122 


KILDGARWFHFSGITPPLGKELPLILEDALK^ANEKGVTVSCDLNYRARLWTK-EEAQKV 


180 


Query: 


194 


LSPLLGLLDVCFGLEPIYLAGESEDLKDELGLSRPYLDI ELLEKITQKIVQEY 


246 






+ P + +DV L ED++ LG+S LD+ E KI +++ ++Y 




Sbj ct : 


181 


MIPFMEYVDV LIANEEDIEKVLGI SVEGLDLKTGKLNREAYAKIAEEVTRKY 


232 


Query: 


247 


GLDYIAFTQREMEYTNQYMLKSYLYHNNMLYQTDKTGVEVLDRVGTGDAFAAGLIHALLE 


306 






+ T RE ++ N + +++ + ++DRVG GD+FA LI+ L 




Sbj ct : 


233 


NFKWGITLRESISATVNYWSVMVFENGQPHFSNRYEIHIVDRVGAGDSFAGALIYGSLM 


292 


Query: 


307 


KETPQRALEIAMATFKYKHTIQGDINIMTRDDIAYLIEKETN 348 








Q+ E A A KHTI GD +++ ++I L T+ 




Sbj ct : 


293 


GFDSQKKAEFAAAASCLKHTIPGDFWLSIEEIEKLASGATS 334 





A related DNA sequence was identified in S. pyogenes <SEQ ID 1263> which encodes the amino acid 
sequence <SEQ ID 1264>. Analysis of this protein sequence reveals the following: 

Possible site: 38 
. »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 0708 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 111/319 (34%) , Positives = 168/319 (51%) , Gaps = 7/319 (2%) 



WO 02/34771 



PCT/GB01/04789 



-493- 



Query: 12 ^KIISLGEVLLRLSPPQYHTLMQAimLKCQFGGSEIiNVLASLAQLGYHVGLVSMjPDND 71 

M+K++ +GE L+R+SP Q+ L A + FGGSE+N+ +L G L +ALPDN , 
Sbjct: 14 MSKLLLVGEPLIRVSPNQFQPLTNACEAQLFFGGSEVNIARTLGGFGLEARLFTALPDNP 73 

5 

Query: 72 LGKMASQFILSQQISPAAIIKKEGRLGIYYYEQGFSVRTNKVIYDRNYSSFWESTLSDYD 131 

+G QF+ + + + R+G+YY E GF R ++V YDR SSF D 
Sbjct: 74 VGHAFHQFLKQSGVDMSLTAWQGHRVGLYYLENGFGCRASQVYYDRCGSSFSALDKDSLD 133 

10 Query: 132 FTSIFKGVDWFHVSGITPALTKDLYEVTRFLMTKAKEGGVKVSIDLNFRESLWSSFQEAR 191 

+IF+G+ FH SGI+ Hi K ++ L+ +AK+ + +S DLNFR S+ + +A+ 
Sbjct: 134 LAAIFEGISHFHFSGISLALGKKTQDLIEVLVREAKKRDICISFDLNFRSSM-IAVADAK 192 

Query: 192 EQLSPLLGLLDVCFGLEPIYIAGESEDLKDELGLSRPYLDIELLEKITQKIVQEYGLDYI 251 
15 S D+ FG+EP+ L + D+D R D ++ +QYLI 

Sbjct: 193 RLFSHFAQYADIIFGMEPLLLDSDDFDMFD RKKADTTTIRERLAGLYQRYQLQAI 247 

Query: 252 AFTQREMEYTOQYMLKSYLYHMiMLYQTDKTGVEVLDRVGTGDAFAAGLIHALLEKETPQ 311 
T+R + K+Y Y + Y++ + VL RVG+GDAF AGL++ LLE Q 

20 Sbjct: 248 YHTERSNDAQGSNHFKAYAY-DRQFYESCEVTTPVLQRVGSGDAFVAGLLYQLLEGNEKQ 306 

Query: 312 RALEIAMATFKYKHTIQGD 330 

R L+ A+AT K T+ D 
Sbjct: 307 RNLDFAVATASLKCTVAED 325 

25 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 388 

A DNA sequence (GBSx0419) was identified in S.agalactiae <SEQ ID 1265> which encodes the amino 
30 acid sequence <SEQ ID 1266>. Analysis of this protein sequence reveals the following: 

Possible site: 15 

>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -1.17 Transmembrane 5- 21( 5- 21) 

35 Final Results 

bacterial membrane Certainty=0 . 1468 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

40 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 389 

45 A DNA sequence (GBSx0420) was identified in S.agalactiae <SEQ ID 1267> which encodes the amino 
acid sequence <SEQ ID 1268>. Analysis of this protein sequence reveals the following: 

Possible site: 53 

>>> Seems to have no N- terminal signal sequence 

50 



55 



INTEGRAL 


Likelihood 




■12. 


.05 


Transmembrane 


198 


- 214 


( 


191 


- 220) 


INTEGRAL 


Likelihood 




•11. 


.68 


Transmembrane 


446 


- 462 


( 


437 


- 467) 


INTEGRAL 


Likelihood 




-9. 


.55 


Transmembrane 


94 


- 110 


( 


91 


- 116) 


INTEGRAL 


Likelihood 




-7.43 


Transmembrane 


291 


- 307 


( 


283 


- 309) 


INTEGRAL 


Likelihood 




-4. 


,88 


Transmembrane 


265 


- 281 


{ 


257 


- 282) 


INTEGRAL 


Likelihood 




-4. 


.62 


Transmembrane 


321 


- 337 


( 


318 


- 339) 


INTEGRAL 


Likelihood 




-3 . 


,93 


Transmembrane 


406 


- 422 


( 


405 


- 426) 


INTEGRAL 


Likelihood 




-1. 


.59 


Transmembrane 


121 


- 137 


( 


121 


- 137) 
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INTEGRAL Likelihood = -1.12 Transmembrane 345 - 361 ( 345 - 362) 
INTEGRAL Likelihood = -0.48 Transmembrane 43 - 59 ( 43 - 59) 



Final Results 

5 bacterial membrane Certainty=0 . 5819 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

10 >GP:CAB13641 GB:Z99113 similar to H+-symporter [Bacillus subtilis] 

Identities = 105/452 (23%) , Positives = 182/452 (40%) , Gaps = 37/452 (8%) 

Query: 36 IYLFTFMFVTYFSTGVLGSAAIFVSQIMGYIRIFDGFIDPAIGIMIDKTDTKFGKYRPIL 95 
IY ++ +F T V G +A + +RI D DP IG ++D+T+++F ++RP L 

15 Sbjct: 27 IYATVSTYLLFFYTDVFGLSAAAAGTMFLVVRIIDALADPFIGTIVDRTNSRFARFRPYL 86 

Query: 96 I IGNVTTALSLI FLLALRGVDENIRFPLFILVLI IHKIGYSMQQTITKAGQTALTNDPKQ 155 

+ G A+LL + ++I+GS+T ALT+ 

Sbjct: 87 LFG AFPFVILAILCFTTPDFSDMGKLIYAYITYVGLSLTYTTINVPYGALTS-AMT 141 

20 

Query: 156 RPIFNIVDAVMTTSLMTGGQFWSVFLVPKFGNFTPQFFNVLIFGTILISAILAIV--AI 213 

R +V L +V F VP + G L IL ++ + 

Sbjct: 142 RNNQEWSITSVRMLFANLGGLVVAFFVPLLAAYLSDTSGNESLGWQLTMGILGMIGGCL 201 

25 Query: 214 IGIWAKDRKEFFGLGENTQKTALKDYWKVLKGNKPLQILSIAAALVKFAIQFFGDSV-VM 272 

+ K KE L ++ +K D ++ + N+PL +LSI ++ F + +SV + 
Sbjct: 202 LIFCFKSTKERVTLQKSEEKIKFTDIFEQFRVNRPLWLSIFFIII-FGVNSISNSVGIY 260 

Query: 273 VLLFGI LFGNYALSGQFSLLF I VPGVI INILFST I ARKKGLRFSYVRAI QIGMIGL 328 

30 +++LYLGL I+P I L + +KK L + A+ + +IGL 

Sbjct: 261 YVT YNLEREDLVKWYGL I GSLPALVI LP - - FI PRLHQFLGKKKLLNY ALLLNIIGL 314 

Query: 329 LAFGAVLYVGKPGDLSLTSLNLYTILF1VTNIIARYASQAPASLVLTMGADISDYETSES 388 
LA L + N+Y IL V +IA S + + + +Y +' 

35 Sbjct: 315 LAL LFVPPSNVYLIL- -VCRLIAAAGSLTAGGYMWALIPETIEYGEYRT 361 

Query: 389 GRYVSGMIGTIFSLTDSIASSFAPMWGFVLAGIGFSKSFPTIETPLPPDLKMAAISILV 448 

G+ + G+I I ++VGVLG+ PM + 

Sbjct: 362 GKRMGGLI YAI IGFFFKFGMALGGWPGLVLDKFGY VANQAQTPAALMGILITTT 416 

40 

Query: 449 AIPFIALSIALLLMKFYKLDKEEMVRIQEKIQ 480 

IP L +AL+ + FY LD+++ + +++ 
Sbjct: 417 1 1 PVFLLVLALIDINFYNLDEKKYKNMVRELE 448 

45 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 390 

A DNA sequence (GBSx0422) was identified in S.agalactiae <SEQ ID 1269> which encodes the amino 
50 acid sequence <SEQ ID 1270>. Analysis of this protein sequence reveals the following: 

Possible site: 52 

»> Seems to have no N-terminal signal sequence 

Final Results 

55 bacterial cytoplasm Certainty=0. 3375 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database: 
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>GP:AAB17663 GB.-U31175 D-specific D-2-hydroxyacid dehydrogenase [S. aureus] 
Identities = 165/331 (49%) , Positives = 231/331 (68%) , Gaps = 1/331 (0%) 

Query: 1 MMKLKVFNWEEEATIoAQDWANRNHVELSMS 60 
5 M K+ F R+ E +A +W +N+VE++ S+ L+ TV++++ +DG+ Q L++ 

Sbjct: 1 MTKIMFFGTRDYEKEMALNWGKKNNVEVTTSlffi 60 

Query: 61 IYPLLKEMGIKQIAQRSAGVD^ra^EIAKQHGIIISOTPSYSPESIAEFTVTIALNLIRK 120 
+YP L+ GIKQIAQR+AG DMY+L+LAK+H I+ISNVPSYSPE+IAE++V+IAL L+R+ 
10 Sbjct: 61 VYPKLESYGIKQIAQRTAGFDMYDLDLAKKHNIVISNVPSYSPETIAEYSVSIALQLVRR 120 

Query: 121 VELrRANVREQNFSWTLPIRGRVLGJ^TVAIIGTGRIGIATAKrFKGFGCRVIGYDIYHN 180 

I V+ +F+W I + + NMTVAI IGTGRIG ATARI + GFG + YD Y N 
Sbjct: 121 FPDIERRVQAHDFTWQAEIMSKPVKNMTVAIIGTGRIGAATAKIYAGFGATITAYDAYPN 180 

15 

Query: 181 PMADGILEYVNSVEEAVEEADLVSLHMPPTAENTHLFOTjDMFKQFKKGAILMNI^GALV 240 

D L Y +SV+EA+++AD++SLH+P E+ HLF+ MF KKGAIL+N ARGA++ 
Sbjct: 181 KDLD-FLTYKDSVKEAIKDADIISLHVPANKESYHLFDKRMFDHVKKGAILVNAARGAVI 239 

20 Query: 241 ETKDLLEALDQGLLEGAGIDTYEFEGPYIPKNCQGQDISDKDFLRLINHPKVIYTPHAAY 300 

T DL+ A++ G h GA IOTYE E Y + +DI DK L LI H +++ TPH A+ 
Sbjct: 240 NTPDLIAAVI^GTLLGAAIDTYENEAAYFTlffiVmCKDIDDKTLLELIEHERILVTPHIAF 299 

Query: 301 YTDEAVKNLVEGALNACVEVIETGTTTTKVN 331 
25 ++DEAV+NLVEG LNA + VI TGT T++N 

Sbjct: 300 FSDEAVQNLVEGGLNAALSVINTGTCETRLN 330 

There is also homology to SEQ ID 124. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
30 vaccines or diagnostics. 

Example 391 

A DNA sequence (GBSx0423) was identified in S.agalactiae <SEQ ID 1271> which encodes the amino 
acid sequence <SEQ ID 1272>. Analysis of this protein sequence reveals the following: 

Possible site: 40 
35 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2364 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

40 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
45 vaccines or diagnostics. 

Example 392 

A DNA sequence (GBSx0424) was identified in S.agalactiae <SEQ ID 1273> which encodes the amino 

acid sequence <SEQ ID 1274>. This protein is predicted to be regulatory protein (pfoS/R). Analysis of this 

protein sequence reveals the following: 

50 Possible site: 37 

»> Seems to have a cleavable N-term signal seg. 

INTEGRAL Likelihood =-12.90 Transmembrane 64 - 80 ( 53 - 89) 

Final Results 
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bacterial membrane Certainty=0 . 6158 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 {Not Clear) < suco 

5 A related GBS nucleic acid sequence <SEQ ID 9325> which encodes amino acid sequence <SEQ ID 9326> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC65034 GB:AE001189 regulatory protein (pfoS/R) [Treponema 
pallidum] 

10 Identities = 33/91 (36%) , Positives = 55/91 (60%) , Gaps = 1/91 (1%) 



15 



Query: 1 MAmnAKPKIMLPMISSAAILGILGALFNICGTPASAGFGISGLIGPINAIiNLAKGGWSV 60 

M N + P + +P++ + + G+L LFN+QGTPASAGFG GL+GPINA L V 
Sbjct: 250 MPMVIRyPII^IPLLLWGLVCGVLAWLPNLQGTPASAGFGFIGLVGPINAYRLMaYTPMV 309 

Query: 61 MNMLLIIIIFVAAPIILNFIFNYLFIKVLKI 91 

+L ++ FV + + ++ +++ + LK+ 
Sbjct: 310 RAGILFLVYFVLS-FLAAYLIDFILVDRLKL 339 

20 A related DNA sequence was identified in S.pyogenes <SEQ ID 1275> which encodes the amino acid 
sequence <SEQ ID 1276>. Analysis of this protein sequence reveals the following: 

Possible site: 51 
>>> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood =-12.31 Transmembrane 141 - 157 ( 133 - 166) 
25 INTEGRAL Likelihood = -6.00 Transmembrane 92 - 108 ( 88 - 112) 

Final Results 

bacterial membrane — Certainty=0. 5925 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

30 bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAC65034 GB:AE001189 regulatory protein (pfoS/R) [Treponema 
pallidum] 

35 Identities = 63/178 (35%) , Positives = 107/178 (59%) , Gaps = 10/178 (5%) 

Query: 2 IGQGIASLLGLQPILMSLLIAMIFCFLIVSPITTVGIALAINLSGIGSGAASFG 55 

+G+ IA+ + LQP+LMS+L++M F +I+SP+++V + +A+ L+G+ SGAA+ G 
Sbjct: 164 VGRVIATFIALQPLLMSILLSMSFSLIIISPVSSVAVGIAVGLTGLASGAANIGVSSCAM 223 

40 

Query: 56 -LCLAGWAvNSKGTSLAHVLRSPKISMANVLSKPKIMLPMLCSAA.vLGVIGAIFNIQGTP 114 

L+ VN G LA +K+MN + P + +P+L + V GV+ +FN+QGTP 
Sbjct: 224 TLIVGTMRWKIGVPLAMFAGAMKM^PNWIRYPimiPLL^GLVCGVIAWLFNLQGTP 283 

45 Query: 115 ASAGFGISGLIGPINAIiNLAKGGWCP-VNILLIIIIFVGAPIVLNMIFNYLFIKVLKV' 171 

ASAGFG GL+GPINA L + P V ++ +++ + + +++ + LK+ 

Sbjct: 284 ASAGFGFIGLVGPINAYRLM- -AYTPMVRAGILFLVYFVLSFLAAYLIDFILVDRLKL 339 

An alignment of the GAS and GBS proteins is shown below: 

50 Identities = 86/101 (85%) , Positives = 96/101 (94%) 

Query: 1 MANVLAKPKIMLPMISSAAILGILGALFNIQGTPASAGFGISGLIGPINAOTAKGGWSV 60 

MANVL+KPKIMLPM+ SAA+LG++GA+FNIQGTPASAGFGISGLIGPINALNLAKGGW 
Sbjct: 81 MAN\njSKPKIMLPMLCSAAVLGVIGAIFNIQGTPASAGFGISGLIGPINALNLAKGGWCP 140 



55 



Query: 61 MNMLLIIIIFVAAPIILNFIFNYLFIKVLKIIDPMDYKLDI 101 

+N+LLIIIIFV API+LN IFNYLFIKVLK+IDPMDYKLDI 
Sbjct: 141 WILLIIIIFVGAPIVIJJMIFNYLFIKVLKVIDPMDYKLDI 181 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 393 

A DNA sequence (GBSx0426) was identified in S.agalactiae <SEQ ID 1277> which encodes the amino 
acid sequence <SEQ ID 1278>. This protein is predicted to be regulatory protein (pfoS/R). Analysis of this 
protein sequence reveals the following: 
Possible site: 48 

>» Seems to have no N-terminal signal sequence 
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Final Results 

bacterial membrane -• 
bacterial outside -■ 
bacterial cytoplasm -■ 



- Certainty=0 .3633 (Affirmative) < suco 

- Certainty=0. 0000 (Not Clear) < suco 

- Certainty=0 . 0000 (Not Clear) < suco 



A related GBS nucleic acid sequence <SEQ ID 9735> which encodes amino acid sequence <SEQ ID 9736> 
was also identified. 

A related GBS nucleic acid sequence <SEQ ID 9697> which encodes amino acid sequence <SEQ ID 9698> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC65034 GB:AE001189 regulatory protein (pfoS/R) [Treponema 
pallidum] 

Identities = 61/158 (38%) , Positives = 92/158 (57%) 

Query: 24 KSFIMNVLNGL^GTVIVLIPGAILGELMKALLPMWSGFATLIAATAVATSMMGLVIGIM 83 

+ F+M +LNG + G VI L+P AI GEL +AL P+ FA L + +IG + 

Sbjct: 9 RQFMMKILNGSSAGIVIGLVPPAIAGELFRALAPLSPLFAALYHWLPIQFSVPALIGTL 68 

Query: 84 VGLNFKFNPIQSASLGLAVMFAGGAATFLKGAIMLKGTGDIINMGITAALGVLLIQFLSD 143 

VGL F + + A+L + A G T GA ++ G GD+IN+ + +AL ++L++ L 
Sbjct: 69 VGLQFHCSAPEVATLAFVSVIASGNVTLQNGAWLITGIGDVINVMLISALAIILVRALRG 128 

Query: 144 KTKSFTLIVIPTVTLLLVGGVGHVLLPYVKMITTMIGQ 181 

K S T+I +P + ++ GGVG LPYVKMIT +G+ 
Sbjct: 129 KLGSLTIIALPVIVAWAGGVGSFSLPYVKMITLFVGR 166 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1279> which encodes the amino acid 
sequence <SEQ ID 1280>. Analysis of this protein sequence reveals the following: 

Possible site: 48 
>>> Seems to have no N-terminal signal sequence 



INTEGRAL 


Likelihood 




13. 


.06 


Transmembrane 


314 


- 330 


( 


301 


- 335) 


INTEGRAL 


Likelihood 




11. 


.30 


Transmembrane 


185 


- 201 


( 


178 


- 215) 


INTEGRAL 


Likelihood 




-8. 


.01 


Transmembrane 


22 


- 38 


( 


11 


- 42) 


INTEGRAL 


Likelihood 




-3, 


.29 


Transmembrane 


266 


- 282 


( 


265 


- 285) 


INTEGRAL 


Likelihood 




-2. 


,66 


Transmembrane 


141 


- 157 


( 


141 


- 159) 


INTEGRAL 


Likelihood 




-2. 


.13 


Transmembrane 


53 


- 69 


( 


53 


- 69) 


INTEGRAL 


Likelihood 




-1, 


.33 


Transmembrane 


114 


- 130 


( 


113 


- 131) 


INTEGRAL 


Likelihood 




-0. 


.80 


Transmembrane 


206 


- 222 


( 


206 


- 222) 



Final Results 

bacterial membrane 
bacterial outside 



-- Certainty=0. 6222 (Affirmative) < suco 
-- Certainty=0. 0000 (Not Clear) < suco 
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bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAC65034 GB:AE001189 regulatory protein (pfoS/R) [Treponema 
pallidum] 

Identities = 137/346 (39%) , Positives = 217/346 (62%) , Gaps = 14/346 (4%) 

Query: 12 FMNKOTAGTAIAIWALIPNAIIATFLKPLLP-NMAAliEFLHIVQVFQFFTPIMAGFLIG 70 

FM K+L G++ IV+ L+P AI + L P + A H+V QF P + G L+G 

Sbjct: 11 FMMKILNGSSAGIVIGLVPPAIAGELFRALAPLSPLFAALYHWLPIQFSVPALIGTLVG 70 

Query: 71 QQFKFNPMQQLAVGGAAYIGSGAWAYTEVIQKGVATGTFQLRGIGDLINMMITASLAVLA 130 

QF++++1SG +G++ GIGD+IN+M+ ++LA++ 

Sbjct: 71 LQFHCSAPEVATLAFVSVIASG NVTLQNGAWLITGIGDVINVMLISALAIIL 122 

Query: 131 VKYFGNKFGSLTIILLPITIGTGVGYIGWKFLPOTSYVTTLIGQGINSFTTLQPILMSIL 190 

V+ K GSLTII LP+ + G +G LPYV +T +G+ I +F LQP+LMSIL 
Sbjct: 123 VRALRGKLGSLTI IALPVIVAVVAGGVGSFSLPYVKMITLFVGRVIATFIALQPLLMS IL 182 

Query: 191 IAVAFSLIIVSPISTVAIGLAIGLNGMAAGAASMGIASTAAVLVWATLKVNKSGVPIAIA 250 

++++FSLII+SP+S+VA+G+A+GL G+A+GAA++G++S A L+ T++VNK GVP+A+ 
Sbjct: 183 LSMSFSLIIISPVSSVAVGIAVGLTGLASGAANIGVSSCAMTLIVGTMRVNKIGVPLAMF 242 

Query: 251 LGAM™WPNFLKHPIMAIPMVFTAAISSLTVPLFNLVGTPASSGFGLVGAVGPIAS--L 308 

GAMKM+MPN++++PI+ IP++ + + LFNL GTPAS+GFG +G VGPI + L 
Sbjct: 243 AGAMKMLMPIWIRYPILNIPLLLNGLVCGVLAWLFNLQGTPASAGFGFIGLVGPINAYRL 3 02 

Query: 309 AGGSS IL II ILAWI IVPFAVAFAAHKVSKDILKLYKEDIFVFE 351 

+ ++ 1+ L + ++ F A+ + D LKLY+ ++F+ E 
Sbjct: 303 MAYTPMVRAGILFLVYFVLSFLAAYLIDFILVDRLKLYRRELFIPE 348 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 65/172 (37%) , Positives = 95/172 (54%) , Gaps = 9/172 (5%) 

Query: 19 EKQTTKSFIMDmjNGLALGWIVLIPGAILGELMKM^LPmSGFATLIAATAVATSMMGL 78 

+K+T SF+ VL G A+ V+ LIP AIL +K LLP + A + V + 
Sbjct: 5 DKETFSSFIWKVLAGTAIAIWALIPNAILATFLKPLLPNMAA-AEFLHIVQVFQFFTPI 63 

Query: 79 VIGIMVGLNFKFNPIQSASLGLAVMFAGGAATFLK GAIMLKGTGDIINMGIT 130 

+ G ++G FKFNP+Q ++G A GA + + G L+G GD+INM IT 

Sbjct: 64 MAGFLIGQQFKFNPMQQLAVGGAAYIGSGAWAYTEVIQKGVATGTFQLRGIGDLINMMIT 123 

Query: 131 AALGVLLIQFLSDKTKSFTLIVIPTVTLLLVGGVGHVLLPYVKMITTMIGQG 182 

A+L VL +++ +K S T+I++P VG +G LPYV +TT+IGQG 

Sbjct: 124 ASLAVLAVKYFGNKFGSLTIILLPITIGTGVGYIGWKFLPYVSYVTTLIGQG 175 

A related GBS gene <SEQ ID 8567> and protein <SEQ ID 8568> were also identified. Analysis of 
protein sequence reveals the following: 

Lipop; Possible site: -1 Crend: 10 
McG: Discrim Score: -13.49 
GvH: Signal Score (-7.5): -5.82 

Possible site: 48 
»> Seems to have no N- terminal signal sequence 
ALOM program count: 5 value: -6.58 threshold: 0.0 
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modified ALOM score: 1.82 



*** Reasoning Step: 3 
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Final Results 

bacterial membrane Certainty=0 .3633 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

ORF01226(352 - 843 of 1218) 

EGAD)138195)tP0038 (3 - 166 of 350) regulatory protein {Treponema pallidum} OMNI]TP0038 
regulatory protein (pfoS/R) GP| 3322295 |gb|AAC65034 . 1 1 |AE001189 regulatory protein (pfoS/R) 
{Treponema pallidum} PIR|E71373 |E71373 probable regulatory protein (pfoS/R) - syphilis 
spirochete 
%Match =13.6 

%Identity = 37.2 %Similarity =59.1 

Matches = 61 Mismatches = 67 Conservative Sub.s = 36 



273 303 333 363 393 423 453 483 

I*FFPIFLLQIAMI*LI*LVKSQTIIISRRHLMSDVVEKQTTKSFIMNVIiNGLALGWIvLIPGAILGELMKALLPMWSG 

: = = hi HII : I II hi II III =11 h 
MHTQSLSPRQFMMKILNGSSAGIVIGLVPPAIAGELFRALAPLSPL 

10 20 30 40 

513 543 573 603 633 663 693 723 

FATLIAATAVATSMMGLVIGIMVGLNFKFNPIQSASLGLAVMFAGGAATFLKGAIMLKGTGDIINMGITAALGVLLIQFL 



FAALYHWLPIQFSVPALIGTLVGLQFHCSAPEVATIAFVSVIASGNVTLQNGAWLITGIGDVINVMLISALA.IILVRAL 
60 70 80 90 100 110 120 

753 783 813 843 873 903 933 963 

SDKTKSFTLIVIPTVTLLLVGGVGHVLLPWKMITTMIGQGTRRTHENFLFILLCPDINFEKIPF*INDLLSLFLQIIGL 

I hhl :| : llll llllllll =h 

RGKLGSLTIIALPVIVAWAGGVGSFSLPYVKMITLFVGRVIATFIALQPIiLMSILLSMSFSLIIISPVSSVAVGIAVGL 

140 150 160 170 180 190 200 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 394 

A DNA sequence (GBSx0428) was identified in S.agalactiae <SEQ ID 1281> which encodes the amino 
acid sequence <SEQ ID 1282>. This protein is predicted to be cyn operon transcriptional activator. Analysis 
of this protein sequence reveals the following: 

Possible site: 15 

>» Seems to have no N-terminal signal sequence 



Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB15857 GB:Z99123 alternate gene name: ipa-24d~similar to 

transcriptional regulator (LysR family) [Bacillus subtilis] 
Identities = 87/282 (30%) , Positives = 152/282 (53%) , Gaps = 5/282 (1%) 

Query: 1 MDIRQLTYFIAVAEAKNYSRAAKSLFVTQPTLSQSIKK^ 60 

MDIR LTYF+ VA K++++A++SL+V+QPT+S+ IK LE EL LF +NGRQ+ LT+A 
Sbjct: 1 MDIRHLTYFLEVARLKSFTKASQSLyVSQPTISKMIKNLEEELGIELFYRNGRQVELTDA 60 

Query: 61 GEILYEKGQLLMTNVNQMVTEIQQLNQEKKEGIRVGLTSLFAIQFMKQI-STFMATHSNV 119 

G +Y + Q ++ + + +E+ + + KK +R+GL + F ++ F + NV 
Sbjct: 61 GHSMWQAQEIIKSFQNLTSELNDIMEVKKGHvRIGLPPMIGSGFFPRVLGDFRENYPNV 120 
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Query: 120 EVSLIQDGSRKLQELLAKGKIDIGLLSFPSTRHDITIEPLQTSTKGYKVSIVMPKSHPLA 179 

L++DGS K+QE + G +DIG++ P+ + + T + +V+ SH LA 
Sbjct: 121 TFQLVEDGSIKVQEGVGDGSLDIGVWLPANEDIFHSFTIVKET LMLVVHPSHRLA 176 

5 Query: 180 TLPEIELNDLRDYKVASI^HYMLGEMI.PRKCRAK^^ 239 

E +L +L+D E ++L + +C GF PHI+++ + W+ + + 

Sbjct: 177 DEKECQLRELKDEPFIFFREDFVLHNRIMTECIKAGFRPHIIYETSQWDFISEMVSANLG 236 

Query: 240 VTILPSEFESISQVQDLCWVPLKDKNNFYPIGIAYRNDTSFS 281 
10 + +LP + + +PL D + + I +R D S 

Sbjct: 237 IGLLPERI CRGLDPEKVKVI PLVDPVI PWHLAITWRKDRYLS 278 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1283> which encodes the amino acid 

sequence <SEQ ID 1284>. Analysis of this protein sequence reveals the following: 

15 Possible site: 21 

>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1101 (Affirmative) < suco 

20 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 125/160 (78%) , Positives = 144/160 (89%) 

25 

Query: 135 LAKGKIDIGLLSFPSTRNDITIEPLQTSTKGYKVSIVMPKSHPLATLPEIEIiNDLRDYKV 194 

L++GKIDIGLLSF S R DITIE LQTSTKGYKVSIV+ K HPLA P+++L DL+ YK+ 
Sbjct: 1 LSQGKIDIGLLSFLSIRKDITIELLQTSTKGYKVSIVLLKQHPLAQHPQLKLKDLKGYKI 60 

30 Query: 195 ASLNEHYMLGEMLPRKCRALGFDPHIVFKHNDWEVIiIHSLQDLiNAVTILPSEFESISQVQ 254 

ASLN+HYMLGEMLPRKCRALGF+P IVFKHNDWEVLIHSL DLN +TILPS+FES++QV 
Sbjct: 61 ASLNDHYMLGEMLPRKCRALGFEPDIVFKIfflDWEVLIHSLHDljNTLTILPSDFESLNQVI) 120 

Query: 255 DLCWVPLKDKNNFYPIGIAYRNDTSFSPMIEEFLSLLKTN 294 
35 +L W+PL+DKNNFYPIGIAYR+D SFSP+ IEEFLSLLKTN 

Sbjct: 121 NLVWIPLQDKNNFYPIGIAYRDDASFSPVIEEFLSLLKTN 160 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

40 Example 395 

A DNA sequence (GBSx0429) was identified in S.agalactiae <SEQ ID 1285> which encodes the amino 
acid sequence <SEQ ID 1286>. Analysis of this protein sequence reveals the following: 

Possible site: 47 

>» Seems to have- no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1833 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

50 

Signal peptide: 1-21 

A related GBS nucleic acid sequence <SEQ ID 8569> which encodes amino acid sequence <SEQ ID 8570> 
was also identified. 

55 The protein has no significant homology with any sequences in the GENPEPT database. 



45 



No corresponding DNA sequence was identified in S.pyogenes. 



